Mobile app version of vmapp.org
Login or Join
Sent6035632

: How to tell the Browser the character encoding of a HTML website regardless of Server Content-Type Header? I have a HTML page that correctly (the encoding of the physical on disk matches it)

@Sent6035632

Posted in: #Browsers #ContentEncoding #Html #HttpHeaders

I have a HTML page that correctly (the encoding of the physical on disk matches it) announces it's Content-Type:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
<title> ...


Opening the file from disk in browser (Google Chrome, Firefox) works fine.

Requesting it via HTTP, the webserver sends a different Content-Type header:

$ curl -I example.com/file.html HTTP/1.1 200 OK
Date: Fri, 19 Oct 2012 10:57:13 GMT
...
Content-Type: text/html; charset=ISO-8859-1


(see last line). The browser then uses ISO-8859-1 to display which is an unwanted result.

Is there a common way to override the server headers send to the browser from within the HTML document?

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Sent6035632

4 Comments

Sorted by latest first Latest Oldest Best

 

@Si4351233

In addition to what was said here, I'd try use the same charset in all pages - preferably UTF-8 (but if nearly everything is iso-8859-1, use this).

To quicky check the charset of a file, you can try:

file --mime-type --mime-encoding {filename}


To check the charset of all files in the tree, you can try:

find . -type f -exec file --mime-type --mime-encoding '{}' ;


or (calling the file command only once):

find . -type f -print | file --mime-type --mime-encoding -f-


To get a summary, use the -b option to the file command (to omit the filenames) and pipe the result to sort | uniq -c.

10% popularity Vote Up Vote Down


 

@Gretchen104

No, it's not possible from within the HTML. The servers response header take precedence over the document's meta-tag. As it's specified in 5.2.2 Specifying the character encoding - HTML 4.01 Specification :


To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):


An HTTP "charset" parameter in a "Content-Type" field.
A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
The charset attribute set on an element that designates an external resource.



So this requires configuration on the server-side. However as the chapter continues:


User agents may provide a mechanism that allows users to override incorrect "charset" information. However, if a user agent offers such a mechanism, it should only offer it for browsing and not for editing, to avoid the creation of Web pages marked with an incorrect "charset" parameter.


In my case the the server's Content-Type header contains the right mime-type but the wrong charset.

As it turned out, my Apache httpd configuration had set the AddDefaultCharset turned on which was adding the ; charset=ISO-8859-1 part. Placing into the websites root directory .htaccess the following line:

AddDefaultCharset Off


the charset information was removed:

$ curl -I example.com/file.html HTTP/1.1 200 OK
Date: Fri, 19 Oct 2012 15:07:52 GMT
...
Content-Type: text/html


(see last line, no ; charset=... part). This in combination with the html meta-tag triggers the said browser heuristics to take over the charset from the meta tag. The website is properly decoded.

Tested with:


Google Chrome v. 22.0.1229.94
Firefox v. 16.0.1
Lynx Version 2.8.7rel.1 (05 Jul 2009)


These three browsers had problems with the original configuration and work now (all on Fedora 17).


Opera 12.02
Internet Explorer 6 (Win XP SP3)


Didn't have the problem in the first place. Both were preferring UTF-8 from the meta-tag over the ISO-8859-1 setting from the server.


Netscape 2.01 Gold


Does not support UTF-8 so is always choosing Western(Latin1) regardless of the server setting and the meta-tag.

10% popularity Vote Up Vote Down


 

@Yeniel560

You should set something like this in your root .htaccess

<FilesMatch ".(htm|html|xhtml|xml|php)$">
AddDefaultCharset utf-8
</FilesMatch>

10% popularity Vote Up Vote Down


 

@Correia994

"Is there a common way to override the server headers send to the browser from within the HTML document?"

AFAIK no, you do what you can do already. The defined charset via Header trumps your definition in the META tag.

If you have access to the server, e.g. Apache, it is configured by this statement (see the comment lines):

# Read the documentation before enabling AddDefaultCharset.
# In general, it is only a good idea if you know that all your files
# have this encoding. It will override any encoding given in the files
# in meta http-equiv or xml encoding tags.
#AddDefaultCharset UTF-8


[Update]

To second w3d's comment here you'll find some ways to change the charset via htaccess-Directives for the Apache server.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme