Mobile app version of vmapp.org
Login or Join
Welton855

: How to handle URLs with diacritic characters I am wondering how to handle URLs which correspond to strings containing diacritic (á, ǚ, ´...). I believe what we're seeing mostly are URLs where

@Welton855

Posted in: #Redirects #SearchEngines #Seo #Url #UrlEncoding

I am wondering how to handle URLs which correspond to strings containing diacritic (á, ǚ, ´...). I believe what we're seeing mostly are URLs where diacritic characters where converted to their closest ASCII equivalent, for instance Rånades på Skyttis i Ö-vik converted to ranades-pa-skyttis-i-o-vik.

However depending on the corresponding language, such conversion might be incorrect. For instance in German, ü should be converted to ue and not just u, as seen with the below URL representing the Bayern München string as bayern-muenchen:
www.bundesliga.de/en/liga/clubs/fc-bayern-muenchen/index.php
However what I've also noticed, is that browsers can render non-ASCII characters when they are percent-encoded in the URL, which is the approach Wikipedia has chosen, for instance de.wikipedia.org/wiki/FC_Bayern_M%C3%BCnchen which is rendered as:



Therefore I'm considering the following approach for creating URL slugs:

-(1) convert strings while replacing non-ASCII characters to their recommended ASCII representation: Bayern München -> bayern-muenchen
-(2) also convert strings to percent encoding: Bayern München -> bayern_m%C3%BCnchen
-create a 301 redirect from version (1) to version (2)

Version (1) URLs could be used for marketing purposes (e.g. mywebsite.com/bayern-muenchen) but the URLs that would end being displayed in the browser bar would be version (2) URLs (e.g. mywebsite.com/bayern-münchen).

Can you foresee particular problems with this approach? (Wikipedia is not doing it and I wonder why, apart from the fact that they don't need to market their URLs)

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Welton855

3 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

When using international characters in your URLs there are a few issues to be aware of:


Percent URL encoding requires a character set. To display the URL correctly in the web browser, you should use "UTF-8" character set when percent URL encoding your slug. See: What is the proper way to URL encode Unicode characters?
If there are lots of encoded characters in your URL, it can make the URL significantly longer. Here is an example with lots of Japanese characters shown in international text and with URL encoding:

www.dmoz.org/World/Japanese/オンラインショップ/地域別・エスニック/アジア/日本/ http://www.dmoz.org/World/Japanese/%E3%82%AA%E3%83%B3%E3%83%A9%E3%82%A4%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%97/%E5%9C%B0%E5%9F%9F%E5%88%A5%E3%83%BB%E3%82%A8%E3%82%B9%E3%83%8B%E3%83%83%E3%82%AF/%E3%82%A2%E3%82%B8%E3%82%A2/%E6%97%A5%E6%9C%AC/

Your URL may not always be shown with international characters. There are some cases in which your URL will display encoded with lots of % signs.


Older browsers
When the URL is copied and pasted (for example into forums)





Removing diacritics in URLs may be OK, but it is not possible for all international characters (like Chinese and Japanese). It may also be language dependent: ü may be replaced by ue in German, but by just a plain u in some other language.

10% popularity Vote Up Vote Down


 

@Dunderdale272

I would go for the first approach, namely:


replace all diacritics by their understandable counterparts in the given language (for instance, Münschen to Munschen),
then remove all the remaining diacritics by replacing them with the non-accentuated latin-1 letters
then replace the spaces by dashes (and multiple dashes by single ones).


Then your URLs will be readable by your visitors and packed with Search Engine food.

10% popularity Vote Up Vote Down


 

@Dunderdale272

Just make your prefered choice the canonical page then there will be no discussion, however your drop in link juice transfer is minimal, use the formula that the user who type using the diacritic will want to end up on the diacritic.... then you will definitly be maximizing your potential :-)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme