Mobile app version of vmapp.org
Login or Join
Vandalay111

: Should I transliterate my URL paths? I looked around and can't find any info on this, I must be searching the wrong terms or something because it must be a common question. If you have non-ASCII

@Vandalay111

Posted in: #Internationalization #Path #Translation #Url

I looked around and can't find any info on this, I must be searching the wrong terms or something because it must be a common question.

If you have non-ASCII characters in your URLs, Firefox and Chrome show them nicely in the address bar (like en.wikipedia.org/wiki/Cliché), but IE (including IE10 consumer preview) shows a munge of character codes or something, like this en.wikipedia.org/wiki/Clich%C3%A9.

Is one of these approaches more correct than the other? Is it likely that IE will one day start doing what FF and Chrome do?

Basically I'm trying to decide whether on a new site I should transliterate or not. My preference is to not transliterate, because it "seems right" to use the correct characters and it looks better in FF/Chrome. However, it looks horrible in IE, and since the majority of people use IE, that argues for transliteration.

Once you put a policy in place, you're probable never going to change it. So if I know that a future version of IE will start acting like Firefox et al, then I'm happy to lay the foundation with raw URLs and let current users suffer. But if not, I think I'd prefer transliteration. Any recommendation?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Vandalay111

1 Comments

Sorted by latest first Latest Oldest Best

 

@Gloria169

The short answer is "it depends", mostly on what you're going to do with it.

Looking at the spec for RFC3987 Internationalized Resource Identifiers, IE is well within it's rights to encode your URLs, especially if you've got a US/UK keyboard assigned where entering an é might not be the simplest of actions for the user...

On top of that, I've seen servers get very upset when they are expecting one format and the browser sends something slighty different (i.e. " ", "+" or "%20"), see also Handling Character Encoding in a URI in Tomcat for another example.

To be fair to IE9, if I actually type Ctrl+Alt+e+' into the address bar, it does display the character correctly, it's only if I copy/paste it in that it changes:



The wikipedia source actually URL encodes the links with the character and leaves it up to the browser:


<a href="/wiki/Clich%C3%A9_(disambiguation)"


And as w3d point's out in the comments, requesting en.wikipedia.org/wiki/Cliché in Chrome actually results in a request being made for en.wikipedia.org/wiki/Clich%C3%A91
So my recommendation would be to ensure that your site can handle either preference - because some browsers will send you encoded strings if you use IRI's instead of URIs.



Footnotes:


I can't actually get this editor to honour an IRI, using the é symbol cuts the auto recogniser off at the "h", adding it as a link turns it into the encoded version, and correcting the encoded version removes the link altogether.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme