: Strange characters appearing on websites - ASCII? - UNICODE? I have created many very simple pure HTML websites over the years. Most of them appear to work fine most of the time. But there
I have created many very simple pure HTML websites over the years. Most of them appear to work fine most of the time. But there is one recurring problem which I have never quite sorted out involving strange characters.
The scenario goes like this: I create the site. I look at it in my browser, everything appears fine. I may look at it a great many times over the coming weeks or months as I make additions here and there. Perhaps on a variety of browsers on a variety of PC's. Then one day I look at the page and see a random sprinkling of white question marks against dark diamond shapes. These might appear where I had expected to see hyphens or quotes or apostrophes. My immediate thought is that my browser got into some strange state because I was looking at some foreign website with strange characters, but I'm never quite sure. I'm left with that nagging feeling that perhaps half the planet is seeing my website with funny question marks all over it.
So my question is what's going on? What should I do to ensure that as many people as possible around the world can view my text as I originally intended? Should I be using those special html sequences like
£
for all non alphanumeric characters? Should I worry at all?
Edit: Right now I have the problem occurring on this page: www.fullreservebanking.com/papers.htm ... part of it looks like this:
I am using FireFox 5 and the character encoding currently appears to be "UNICODE (UTF-8)". I do not remember manually setting the character encoding to anything since installation. I do occasionally look at Japanese websites for work related reasons - though when I do so, I do not manually make any changes to firefox settings.
Edit: Now fixed. Web page altered accordingly.
More posts by @Nimeshi995
3 Comments
Sorted by latest first Latest Oldest Best
In other words if you are in WordPress and HTML5, only copy proper utf-8 encoded characters over and problem is solved. Just Google "utf-8 list of characters" and copy straight from your browser to your editor(in visual mode). Then the question mark ? wont appear like �
dmsnell's answer about using HTML entities is fine, but this issue can usually be fixed by making sure you are using the proper UTF-8 throughout the entire page generation and serving to users.
For example, if your data is stored in a database, make sure all the text fields use UTF-8 encoding. You should also set the charset when connecting to the database (if using PDO in PHP) or run a query SET NAMES utf8 after connecting, before you start fetching data.
PHP handles UTF-8 fine if you are not modifying strings. If you are, you will need to look into using its multi-byte mb_* functions.
On the page itself, add the content type meta tag. It should go right after the <head> tag.
<meta charset=utf-8">
You could also set this as a HTTP header instead.
These are called gremlins and they are usually caused because whichever program is putting the quotes in is using the actual pretty / curly / smart quotes instead of the proper HTML entities. The fonts don't display right or don't have those characters in them and instead produced the funny symbol.
See this great article from A List Apart on non-quote related typographical oddities in HTML.
The best thing to do would be to create a text processor that runs over the content of your webpage before it is sent to the user (actually, it's best to run this after the content is generated and before it's saved on the server). This processor will do a simple text replacement for those special characters and provide the appropriate HTML entity in its place.
This page from the PHP manual gives excellent code examples in PHP.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.