: Should I escape the Apostrophe ( ' ) character with its HTML entity (')? What characters should be escaped with their HTML entities. For example, & is escaped with &. Should '
What characters should be escaped with their HTML entities. For example, & is escaped with &.
Should ' be escaped with '?
More posts by @Tiffany637
7 Comments
Sorted by latest first Latest Oldest Best
Where is that string going?
Your answer depends on the context:
If you are writing a paragraph in HTML with this data, it might be enough to escape <, > and &:
<p>{string}</p>
If you are writing into an HTML attribute, though, like
<a href='/some/path/{string}'>...</a>
Then you should absolutely escape the apostrophe. This can be an attack vector if an attacker put this in for string:
string = "' onmouseover='alert("nasty script here!")' data-ignore='"
Same thing goes for double quotes. I've even read that the backtick ` is vulnerable, since that could be used for HTML attributes too. If you don't have an automatic HTML syntax checking script as part of your deploy routines, assume that any of these three could be used, and must be escaped for HTML attributes.
At the extreme, even unquoted attributes are valid, so the space character also would need escaping. And !, @ , $, %, (, ), =, +, {, }, [, and ], all of which can break out of an attribute and allow inserting a new one.
What I do
To do escaping in JavaScript, I use JQuery's $(element).text(string) or $(element).attr(attrname, string) to do the escaping for me. Be very careful with $(element).html(unsafe), which does not escape your HTML!
On server-side code, I have to carefully evaluate the risk for each case and read the documentation carefully. This will depend on the particular language and libraries you are using, like Rails, Django, raw PHP, Drupal, etc.
Databases
If you are considering stopping the problem as early as possible, before it even gets into your database, hold your horses. HTML-escaping the text stored in your DB can take you on a hellish ride. What if you later want to allow certain HTML tags, but not others, like italics, bold, colors and tables? What if you missed something in your first pass, but your escaper already escaped & as & and " as "? Will it turn those into &amp; and &quot;?
My approach is to only perform SQL escaping for the database, but leave all HTML special chars in for later processing. This way, I can debug and fine-tune my HTML escapes easily. Mind, that also means I can't trust my own SQL tables if they have user-provided strings.
Moral
Never trust user-controlled input, and always quote your HTML attributes!
Based on: There's more to HTML escaping than &, <, >, and " by Ryan Grove
So let's see if StackExchange itself encodes an apostrophe using an HTML entity.
Here are some examples from the source code of this page.
(1) Question title: Encoded.
Should I escape the Apostrophe ( ' ) character with its HTML entity (&#39;)?
(2) drew's answer: Not encoded.
But I don't believe it is, in general, necessary.
(3) Tom's comment on nitro2k01's answer: Encoded.
I've got two contradicting answers now. One recommends escaping ' and the other does not. What should I believe?
So it goes both ways.
However, this page's source code never uses '. All the encodings are of the form '. This is consistent with nitro2k01 and drew's advice not to use '.
If your apostrophe belongs to content, escape it. Any other content characters that can be confused with code, escape it.
It depends on your use case, but we should probably be discouraged from using ' in natural language generally, so the issue shouldn’t arise unless you have computer code in your XML.
Where we have strings translated, we find that some translators replace the closing quotes with the unicode curly quotes, but leave the straight quotes as the opening quotes, leaving them visually unbalanced and looking unprofessional.
The unicode characters ‘ and ’ should replace ' where possible, much as “ and ” should replace ". This is useful because computers don’t recognise curly punctuation as special. (Although I'm amused to see that Stack Overflow/Chrome considers ‘don’t’ to be a spelling error, whereas it's happy with ‘don't’.)
It doesn't help that we have the very enticing ' and " characters right on the keyboard.
The easiest way to do the job without using the actual entity is to use PHP's htmlentities() or htmlspecialchars() functions:
$val = htmlspecialchars("Don't", ENT_QUOTES, 'UTF-8');
if($_POST){
$val = htmlspecialchars(trim($_POST['val']), ENT_QUOTES, 'UTF-8');
}
echo "<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en' class='njs'>
<head>
<meta http-equiv='Content-type' content='text/html;charset=utf-8' />
<title>Special Characters</title>
<style type='text/css'>
@import 'special.css';
</style>
</head>
<body>
<form method='post' action='' id='fm' name='fm'>
<input type='text' value='$val' name='val' id='val' />
<input type='submit' value='submit' name='sub' id='sub' />
</form>
</body>
<script type='text/javascript' src='special.js'></script>
</html>";
I don't have comment privileges, or I would have left this as a comment on an earlier answer.
DO NOT, I repeat, DO NOT escape an apostrophe in HTML using
'
This is not a valid HTML character entity reference. It is an XML character entity reference. While Firefox and Chrome, at least, will render the above as an apostrophe in an HTML document, Internet Explorer will not. And it is following the standard when it refuses to do so.
You may escape an apostrophe in HTML using
'
But I don't believe it is, in general, necessary.
fishbowl.pastiche.org/2003/07/01/the_curse_of_apos/ en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
I don't agree with Nate. You should ideally use as little escaping as possible and use UTF-8 to express characters natively. To do this you need an editor that can handle UTF-8 as well as a correct charset declaration, such as:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
However, you should make it a habit to escape the characters that have a special meaning in (X)HTML, namely:
< <
> >
" "
& &
' '
This will make sure you're not accidentally writing markup when you want to write these characters. This is especially important for user input, to maintain security. It's less obvious, but it's actually important to escape ". If a string ever ends up in a HTML attribute (title="something" etc.) the user could end the attribute and insert their own markup. Imagine what happens if the user enters " onclick="alert('hello'); and you insert that to title="..."
If you're using PHP, you can use the htmlspecialchars function to do this. Other languages may have other similar functions.
Update: I stand corrected on the apos issue. Damned pesky IE.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.