Mobile app version of vmapp.org
Login or Join
Megan663

: Does Googlebot use hreflang for link discovery? Does anyone know if Googlebot will follow and index web pages that are only listed in the head of a HTML page with a link-hreflang element? <link

@Megan663

Posted in: #Googlebot #Hreflang #Indexing

Does anyone know if Googlebot will follow and index web pages that are only listed in the head of a HTML page with a link-hreflang element?

<link hreflang="fr" rel="alternate" href="http://example.io/fr/page/webpage" />

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Megan663

3 Comments

Sorted by latest first Latest Oldest Best

 

@Odierno851

I added the href-lang attributes to the site and seven days later none of the new language pages have been indexed. Google has crawled 100,000s of pages since then.

10% popularity Vote Up Vote Down


 

@Turnbaugh106

Yes Google Crawler will index the hreflang pages and count them as unique page for your website.

10% popularity Vote Up Vote Down


 

@Harper822

As closetnoc states, Google's main business is to seek and crawl links in hopes that valuable pages are returned and if they are, they are then indexed.

Therefore, unless you explicitly point out to google not to index a certain page, it will crawl everything. Here's a few ways to stop google from indexing pages you don't want indexed.

Let's assume the page you don't want indexed in google is dontindexme.php in document root.

The nicest method to make a file non-indexable to google, yet a method that helps hackers learn more about what files are public on your site is to create robots.txt (a file also publicly accessible) in document root with the following contents:

User-agent: googlebot
Disallow: /dontindexme.php


Another method is to modify the apache configuration to add HTTP headers. Alternatively, you can modify the script to include the HTTP headers as well. The HTTP header you want outputted when the file requested is:

X-Robots-Tag: noindex


While this method doesn't prevent google from beginning the page fetch, it will instruct google not to advertise the page in its search results.

Now if you really want to stop google from crawling a particular URL, you can configure your script or apache so that the result returned to google isn't of status 200 OK. This means you can return a Not found page. But if you choose this route, give a 410 status to indicate the page is gone to google and that google will never index it again.

Another thing that google wouldn't crawl are links only accessible via the POST request method. This means accessing a URL as a result of filling out a form. This is because google does not want to get too interactive on your website (such as logging in or shopping).

Unless you take one of the above actions on a page that isn't meant to be indexed, just assume google will crawl and index the page.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme