Mobile app version of vmapp.org
Login or Join
YK1175434

: Is using HTML entities (for language-specific characters) in UTF-8 necessary? As in the subject-line. Saw the situation the other day on a page which felt weird to me. Except for markup-delimiting

@YK1175434

Posted in: #ContentEncoding #Html

As in the subject-line. Saw the situation the other day on a page which felt weird to me. Except for markup-delimiting characters such as pointy brackets or the ampersand, escaping, say, German umlauts shouldn't be necessary, should it?

Checked the encoding server-side, in-page and by way of HTTP headers, looks completely UTF-8 to me.

What's your take on this and do you reckon it could adversely affect SEO or SERP placement?the page

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @YK1175434

2 Comments

Sorted by latest first Latest Oldest Best

 

@BetL925

That's why those headers and attributes are there: to specify which character set the page uses. XML/XHTML documents should have it in the xml opening tag and HTML documents should have it in the meta tag. If the page has the proper encoding listed (and the file encoding actually matches) then the search engines should be smart enough to figure it out - after all, they purport to reward standards-compliance and good design (and UTF-8 is broadly accepted).

10% popularity Vote Up Vote Down


 

@Angie530

You're right, as long as you can ensure that you're using UTF-8 through and through, then you shouldn't need to escape anything but the XML entities that you mentioned (<, >, &).

I think the reason you see people escaping other characters is because they've become a little "shell-shocked" by UTF-8 being converted into another encoding and having all of their beautiful characters transformed into tall rectangles, or diamonds with a question mark in them, which looks about as unprofessional as you can get.

That only has to happen once or twice in a production environment to make you start reflexively changing everything to HTML entities.

Combine that with the fact that text editors, scripting languages, and database engines have the capability of changing the text encoding on you... I can't say I blame them too much.

But, long story short - if you can guarantee that you're going to have nothing but UTF-8 from source to served pages, there isn't any technical reason to escape anything.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme