: Does Googlebot use hreflang for link discovery? Does anyone know if Googlebot will follow and index web pages that are only listed in the head of a HTML page with a link-hreflang element? <link
Does anyone know if Googlebot will follow and index web pages that are only listed in the head of a HTML page with a link-hreflang element?
<link hreflang="fr" rel="alternate" href="http://example.io/fr/page/webpage" />
More posts by @Megan663
3 Comments
Sorted by latest first Latest Oldest Best
I added the href-lang attributes to the site and seven days later none of the new language pages have been indexed. Google has crawled 100,000s of pages since then.
Yes Google Crawler will index the hreflang pages and count them as unique page for your website.
As closetnoc states, Google's main business is to seek and crawl links in hopes that valuable pages are returned and if they are, they are then indexed.
Therefore, unless you explicitly point out to google not to index a certain page, it will crawl everything. Here's a few ways to stop google from indexing pages you don't want indexed.
Let's assume the page you don't want indexed in google is dontindexme.php in document root.
The nicest method to make a file non-indexable to google, yet a method that helps hackers learn more about what files are public on your site is to create robots.txt (a file also publicly accessible) in document root with the following contents:
User-agent: googlebot
Disallow: /dontindexme.php
Another method is to modify the apache configuration to add HTTP headers. Alternatively, you can modify the script to include the HTTP headers as well. The HTTP header you want outputted when the file requested is:
X-Robots-Tag: noindex
While this method doesn't prevent google from beginning the page fetch, it will instruct google not to advertise the page in its search results.
Now if you really want to stop google from crawling a particular URL, you can configure your script or apache so that the result returned to google isn't of status 200 OK. This means you can return a Not found page. But if you choose this route, give a 410 status to indicate the page is gone to google and that google will never index it again.
Another thing that google wouldn't crawl are links only accessible via the POST request method. This means accessing a URL as a result of filling out a form. This is because google does not want to get too interactive on your website (such as logging in or shopping).
Unless you take one of the above actions on a page that isn't meant to be indexed, just assume google will crawl and index the page.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.