Mobile app version of vmapp.org
Login or Join
Kaufman445

: Charset UTF-8 still shows special signs as question mark For some reason, I can't show special signs on my website. This is my DOCTYPE line: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"

@Kaufman445

Posted in: #CharacterSet

For some reason, I can't show special signs on my website.

This is my DOCTYPE line:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">


And this is the charset line:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


Is this correct? What might be the problem?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Kaufman445

2 Comments

Sorted by latest first Latest Oldest Best

 

@Nickens628

I'm not sure what your website is, but going by the above answer, he claims the character set used for the website data is different than UTF-8, but I do have an answer that can help you.

You will need to make some kind of server script that can actually process the data in the original character set. What you have is this....

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


...which is OK, but we can do better. The following PHP script will work for you.

<?php
error_reporting(7);
$charset=$_GET['charset'];
$html=$_GET['html'];
header("Content-Type: text/html charset=".$charset,true);
$code=file_get_contents($html);
echo $code;
exit();
?>


Save the above file with any name with a .php extension. Let's call it decode.php. Now put the code that you claim produces question marks instead of signs in the same folder. Let's call it marks.htm

Then in your address bar, you go to something like:
example.com/decode.php?html=marks.htm&charset=ISO-8859-1

That will cause the ISO-8859-1 header to be received, and the HTML to print out in proper format.

My code currently does not do error checking because I wrote it quickly, but you're better off specifying the encoding in the HTTP headers than as an HTML meta tag, especially if that meta tag requires decoding to be read.

10% popularity Vote Up Vote Down


 

@BetL925

There are more ways to mess up characters and character sets than can be enumerated in an answer. The important thing is that the character set you declare matches the character set that was used when creating, storing, and reading the data. There are many character sets that can display international characters. UTF-8 is universal and very popular, but it isn't the only one.

You may have saved your HTML file as ISO-8859-1 which can display Western European languages like Spanish and German. ISO-8859-1 can special characters including ßñçüöäé. If you save the HTML file as ISO-8859-1 but then put a UTF-8 character set declaration in, you'll get corruption.

If your website has a database, that can be another source of problems. The database itself has to be set to store UTF-8 text. The tables and fields need to be created with the proper "collation" for UTF-8. The database connection may need to be opened with special flags that tell the driver to expect UTF-8 data. If either of those don't happen you will see character corruption.

Any time that text is serialized or deserialized, there is the potential for corruption. I've had to track down and fix these errors for :


Databases
Backend servers
Templating software
Internal representation of strings in memory for programming languages




I downloaded your about-us.htm file from your web page. It is in ISO-8859-1 format. The Unix file command tells me this:

$ file about-us.htm
about-us.htm: HTML document, ISO-8859 text, with very long lines


You should either declare your character set as ISO-8859-1 (instead of UTF-8) or convert the text to actually be UTF-8. You might find this question on StackOverflow useful: Best way to convert text files between character sets?

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme