Mobile app version of vmapp.org
Login or Join
Odierno851

: How to remove ajax URLs (with specific hash tags) from Google Index Google is indexing ajax pages with hastags since 2015: https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html

@Odierno851

Posted in: #GoogleIndex #Hash #Seo

Google is indexing ajax pages with hastags since 2015: webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html
However, is there a possibility to exclude specific URLs with a specific hash tag (because of duplicate content, i.e. sorting parameters)?



Example:


example.com/#!explore/world (is OK to be indexed)
example.com/#!explore/world:sortby=date (should not be indexed)




Since the page does not get reloaded after the hash tag changes to a new ajax page, it does not make sense to use the <meta name="robots" content="noindex"> tag, since it would count for ALL ajax hash URLs...

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Odierno851

2 Comments

Sorted by latest first Latest Oldest Best

 

@Goswami781

Update your robots.txt and disallow the bots from indexing or crawling these dynamic pages. Your robots.txt can be something like:

User-agent: *
Disallow: /*sortby=date*


Also, if you connected your website to a Google Webmaster Tool, make sure to run the robots.txt tester on the dashboard.

And yes, the for the dynamic pages can be used too.

10% popularity Vote Up Vote Down


 

@Berumen354

The best thing you can do is setting the canonical meta tag for all pages with filtered views (sort by, ascendant, descendant, price range, etc), to let bots know which is the original page and which one should be indexed.

So when URL is:

example.com/#!explore/world:sortby=date


Canonical meta tag should be set to:

<link rel="canonical" href="example.com/#!explore/world">


After implementing the canonical, wait some time, maybe a week, to make sure the bots know that the canonical tag is present and then proceed blocking web crawlers via robots.txt.

After waiting for a couple of days/weeks block via robots.txt

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /*sortby=


Note 1: the /*sortby= will match any url containing the string sortby= . Do not use ! as in regex has a specific meaning.

Note 2: It might be longer or less than a week, check the SERP after a while to see if hash filtered urls have been removed.

Note 3: the order is important. Implement canonical, wait, then block via robots.txt. The reason this is important is because you need to allow web crawlers to read the canonical tags, once the access is "blocked" via robotx.txt they wont be able to see the canonical tags.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme