: How to tell the Browser the character encoding of a HTML website regardless of Server Content-Type Header? I have a HTML page that correctly (the encoding of the physical on disk matches it)
I have a HTML page that correctly (the encoding of the physical on disk matches it) announces it's Content-Type:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
<title> ...
Opening the file from disk in browser (Google Chrome, Firefox) works fine.
Requesting it via HTTP, the webserver sends a different Content-Type header:
$ curl -I example.com/file.html HTTP/1.1 200 OK
Date: Fri, 19 Oct 2012 10:57:13 GMT
...
Content-Type: text/html; charset=ISO-8859-1
(see last line). The browser then uses ISO-8859-1 to display which is an unwanted result.
Is there a common way to override the server headers send to the browser from within the HTML document?
More posts by @Sent6035632
4 Comments
Sorted by latest first Latest Oldest Best
In addition to what was said here, I'd try use the same charset in all pages - preferably UTF-8 (but if nearly everything is iso-8859-1, use this).
To quicky check the charset of a file, you can try:
file --mime-type --mime-encoding {filename}
To check the charset of all files in the tree, you can try:
find . -type f -exec file --mime-type --mime-encoding '{}' ;
or (calling the file command only once):
find . -type f -print | file --mime-type --mime-encoding -f-
To get a summary, use the -b option to the file command (to omit the filenames) and pipe the result to sort | uniq -c.
No, it's not possible from within the HTML. The servers response header take precedence over the document's meta-tag. As it's specified in 5.2.2 Specifying the character encoding - HTML 4.01 Specification :
To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
An HTTP "charset" parameter in a "Content-Type" field.
A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
The charset attribute set on an element that designates an external resource.
So this requires configuration on the server-side. However as the chapter continues:
User agents may provide a mechanism that allows users to override incorrect "charset" information. However, if a user agent offers such a mechanism, it should only offer it for browsing and not for editing, to avoid the creation of Web pages marked with an incorrect "charset" parameter.
In my case the the server's Content-Type header contains the right mime-type but the wrong charset.
As it turned out, my Apache httpd configuration had set the AddDefaultCharset turned on which was adding the ; charset=ISO-8859-1 part. Placing into the websites root directory .htaccess the following line:
AddDefaultCharset Off
the charset information was removed:
$ curl -I example.com/file.html HTTP/1.1 200 OK
Date: Fri, 19 Oct 2012 15:07:52 GMT
...
Content-Type: text/html
(see last line, no ; charset=... part). This in combination with the html meta-tag triggers the said browser heuristics to take over the charset from the meta tag. The website is properly decoded.
Tested with:
Google Chrome v. 22.0.1229.94
Firefox v. 16.0.1
Lynx Version 2.8.7rel.1 (05 Jul 2009)
These three browsers had problems with the original configuration and work now (all on Fedora 17).
Opera 12.02
Internet Explorer 6 (Win XP SP3)
Didn't have the problem in the first place. Both were preferring UTF-8 from the meta-tag over the ISO-8859-1 setting from the server.
Netscape 2.01 Gold
Does not support UTF-8 so is always choosing Western(Latin1) regardless of the server setting and the meta-tag.
You should set something like this in your root .htaccess
<FilesMatch ".(htm|html|xhtml|xml|php)$">
AddDefaultCharset utf-8
</FilesMatch>
"Is there a common way to override the server headers send to the browser from within the HTML document?"
AFAIK no, you do what you can do already. The defined charset via Header trumps your definition in the META tag.
If you have access to the server, e.g. Apache, it is configured by this statement (see the comment lines):
# Read the documentation before enabling AddDefaultCharset.
# In general, it is only a good idea if you know that all your files
# have this encoding. It will override any encoding given in the files
# in meta http-equiv or xml encoding tags.
#AddDefaultCharset UTF-8
[Update]
To second w3d's comment here you'll find some ways to change the charset via htaccess-Directives for the Apache server.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.