: Google and 301/302 we have to solve a problem with Google not correctly indexing our multilanguage sites. We'll redesign our urls in the future but until then we need some way to help google
we have to solve a problem with Google not correctly indexing our multilanguage sites.
We'll redesign our urls in the future but until then we need some way to help google find the different language versions while preserving convenience for the users, especially those that use already existing links.
First I'll explain our site/url structure a bit:
Assume we use the domain foo.com. Our application will then detect the language of the browser and display either the english or the german version of the page. The user can then change the language by clicking a link which results in the query parameter language=xx being appended.
The currently selected language is stored in the session, thus if the user doesn't change the language every page will be displayed in the language that was selected last.
Here's a short example for a user with browser language DE:
foo.com -> foo.com/bar -> foo.com/bar?language=en -> foo.com
(German) (German) (English) (English now)
If the browser language is not supported or not provided, we assume English as a default.
So now, GoogleBot will have problems with this, since foo.com could either return english or german content. Initially, GoogleBot should get the english version but if the language change link is followed (or the german TLD is used, since foo.de redirects to foo.com?language=de) the german version would be delivered).
Another problem is that our page internal links don't carry the language parameter. Although we could change that (with some effort in some cases) those links (without the parameter) are already present on external german sites and thus must be supported somehow.
To overcome this, we thought of the following redirection strategy:
foo.com -> no session yet: detect browser language
-> if DE, then 302 to foo.com?language=de
-> if EN, then deliver the english content
-> we have session, so get the language from the session
-> if DE, then 302 to foo.com?language=de
-> if EN, then deliver the english content
foo.com/bar -> the same as foo.com
foo.de -> 301 to foo.com?language=de
Each page would then additionally have the alternate relation set, e.g.
<link rel="alternate" hreflang="en" href="http://foo.com" />
<link rel="alternate" hreflang="de" href="http://foo.com?language=de" />
So, from our point of view, the following should happen:
The user opens foo.com and either gets the english version or is redirected to the version matching the browser language (if supported)
The user clicks foo.com/bar and depending on the language in the session (or the browser language if the url is directly opened) either the english version is delivered or he's redirected to foo.com/bar?language=xx.
GoogleBot crawls foo.com and since no language is provided it sees the english version, even when following the internal links
The real question here is:
As far as we know, there's no guarantee that GoogleBot either uses a session id or doesn't use it, thus we don't know whether there is a session being reused during crawling.
Thus there are two possibilities:
If GoogleBot uses a session, it might either get english content or a 302 redirect for any url that lacks the language parameter (e.g. foo.com/bar). How would Google handle that?
If GoogleBot does not use a session it would get the german version for foo.de, due to the redirect. If it then follows an internal link to foo.com/bar it should get the english content, since no language is provided and a new session is created. In that case, would GoogleBot store the alternate link foo.com/bar?language=de for the german version?
Thanks so far to all who endured reading all that and even more so to all who have an answer to share.
Please note that we're working on redesigning our urls to always have the language in the path (e.g. foo.com/en/bar) but that'll take a while and we also need to handle already existing links without the language. So please don't just suggest to restructure the urls.
Edit:
As of request, here's the original problem that we're trying to tackle.
Our customer generally uses their .com domain and they want the result pages to display a German description when using google.de and an English description in any other case.
However, the results are always displayed in English.
Our current (seemingly insufficient) approach is this:
foo.com -> page in english or german, depending on the request language (and currently on the session as well)
-> alternate link with hreflang=en : foo.com?language=en
-> alternate link with hreflang=de : foo.com?language=de
We assumed that Google would be able to use the alternate language versions and display the best fitting version in a localized result page.
This, however, didn't work out so well yet and we assume one reason is that foo.com itself isn't an alternate link to anything. Thus we thought about leaving the language parameter out for the english version (see the question above) and use foo.com as the alternate link for hreflang=en.
We're no SEO experts though, so this could be wrong. If so please correct me with some hints or explanations. :)
With this trail of though we arrived at the point where foo.com should represent the english version of the page only. However, if we change that, all german users (which are about 50% of the visitors) would first see the english page, because of the many links in the wild that don't contain any language parameter. And this is something our customer clearly doesn't want.
This again led us to the approach using redirects to foo.com?language=xx and calculating xx from the browser language (and the session as well, if it contains language information).
More posts by @Cooney921
2 Comments
Sorted by latest first Latest Oldest Best
This is a pretty challenging problem from an SEO standpoint and I'm not sure there is a pure best practices SEO way to handle it in it's current form.
The way you are running things I don't believe you can get Google to show Germain results, Google will only show the German results if it has the German content, as you've discovered (based on what you described above) Googlebot isn't keeping the session ID and therefore isn't indexing the German content.
What you'll need to do to get this to work while using the same URL for both versions, is to actually load both versions of the content for everyone and use either JavaScript or CSS display none to hide the version you don't want the user to see.
This has several problems with it:
Google does not like you to hide content from users, in this case you'll be displaying the same content to everyone (Googlebot and people) so it's not technically cloaking but it is in a gray area so it comes with risks.
You'll have to leave the title tags and meta descriptions blank. The title tags is almost always used by Google (although they reserve the right to use whatever they want they rarely do) so if you write an English title you'll get an English title for German keywords. The title is a fairly big on page factor so not having one will hurt your ability to rank for both English and German keywords.
This relies on Google to pick the right content. Since you have both versions shown to Google and have not given any other direction with titles and meta descriptions you are completely relying on Google to pick the right content to show in each instance, while Google is generally good at this you can bet there will be at least a few instances that don't render in the SERPs the way you would like. (I always try my best to make sure Google doesn't have to guess at anything).
The best option would be to restructure the URLs so each version is on it's own URL and then run a manual "link reclamation" campaign to get all those links in the wild to update to the new URLs. But if you're looking for something in the mean time I'd say this or a variation of it is your best bet. The plan you laid out above should result in the English version being indexed and shown in the SERPs even for the German phrases (IE no change from the results you are currently getting). Hope this helps good luck.
Googlebot may have issues using session ID cookies so you must not rely on them. You need distinct pages for each language and I recommend that you switch to foo.com/en/bar format but if you must use the querystring then add code to alter every anchor <a> tag on the page to dynamically include the language parameter based on the presence of the querystring existing.
Another thing you can do is generate a sitemap.xml containing both languages and submit that to google webmaster tools to be crawled. In other words a site map with both foo.com/bar?language=de and foo.com/bar?language=en urls. Example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
</url>
<url>
<loc>http://www.example.com/?language=de</loc>
</url>
<url>
<loc>http://www.example.com/about-us</loc>
</url>
<url>
<loc>http://www.example.com/about-us?language=de</loc>
</url>
</urlset>
Then in Google Webmaster Tools' URL Parameter Configuration screen add (or modify) that language parameter to indicate to Google that it (1) changes page content and (2) its affect is that it Translates.
If features such as JavaScript, cookies, session IDs, frames, DHTML, or Macromedia Flash keep you from seeing your entire site in a text browser, then spiders may have trouble crawling it. support.google.com/webmasters/bin/answer.py?hl=en&answer=40349
see also stackoverflow.com/questions/7958971/does-google-bot-keep-session-when-crawling-asp-net
As far as we know, there's no guarantee that GoogleBot either uses a session id or doesn't use it, thus we don't know whether there is a session being reused during crawling
Correct, there is no session being saved or carried over as it crawls and it does not crawl the site in one sitting. It will stop, and start at it's own leisure. Stop depending on session.
As for your redirection strategy to combat the links already existing in the wild that will help those who linked up the DE version pages without the language parameter. Google will follow the 302 redirects correctly but they are temporary redirects and it won't correctly link to the proper language and will likely always land on the english version.
About the 302s:
302 redirect would not effect google indexing the new urls that your redirecting to. Google sees it as temporary and will index the original link. If you want to fix link juice you'll need to 301. Here's a handy infographic www.seomoz.org/blog/an-seos-guide-to-http-status-codes
From a blog post comment by Matt Cutts on 302 redirects in 2006 (I know that's a long time ago but I think the advice still holds true.) :
an on-domain 302 shouldn’t hurt a site, but if you’ve moved everything to a new location for good, I’d try using a 301 (permanent) redirect instead.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.