Mobile app version of vmapp.org
Login or Join
Rambettina238

: Preventing Google from crawling URLs with URL parameters when a friendly URL exists for the same content In e-commerce sites it is common to have multiple parameters to filter, narrow, sort data.

@Rambettina238

Posted in: #Googlebot #GoogleSearchConsole #UrlParameters

In e-commerce sites it is common to have multiple parameters to filter, narrow, sort data. Hence Google provides the URL parameters section in Webmasters.

In our sample site we have the following 2 URLs generated which link to the same content:

/dresses/women/prada-size32-kneelength.html


and link with URL parameters

/dresses/women.html?ajaxcatalog=true&size=32&manufacturer=prada&length=kneelength


We have left the parameters options as "Let Google Decide" - however, it is noticed in the logs that Google is crawling both of the above links.

Why is Google crawling 2 similiar links? Is it because it finds it and hence crawls (seems logical)? But then what is the use of the "Let Google Decide"? Crawling 2 similiar links results in a waste of crawl budget and system resources.

To avoid the above we have 2 options:


Include a Disallow the size, manufacturer, length in the robots.txt OR
set each of the URL parameters to no crawl in the Google Webmasters.


Would there be any downside to each any of then options above? Is it a general practice for e-commerce stores to block all parameter related data (carefully ofcourse) since most of it is in general duplicate data.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Rambettina238

3 Comments

Sorted by latest first Latest Oldest Best

 

@Gail5422790

You can solve it with canonical tag in head of your pages:
for example you set this canonical tag:

<link rel="canonical" href="www.example.com/dresses/women/prada-size32-kneelength.html" />


for two Urls above:
example.com/dresses/women/prada-size32-kneelength.html www.example.com/dresses/women.html?ajaxcatalog=true&size=32&manufacturer=prada&length=kneelength

10% popularity Vote Up Vote Down


 

@Speyer207

I had that happen to me. So Google will try and crawl everything on your site, and I've even had Google bug out on me and ignore my robots.txt once. It took a month for Google to correct itself again!

Also, I've had Google moan at me under HTML Improvements about duplicate content where it has crawled random pages with URL Parameters. Once I had gone over each one of my URL parameters and manual configured each entry, the duplicate content warnings stopped appearing over a few weeks. The only downside here is if you pick the wrong URL parameter to be ignored.

10% popularity Vote Up Vote Down


 

@Cooney921

The Google Bot tries to craw everything mentioned or linked on your site / the whole I set up a test case and the bot even crawled urls like this:

<script>
// Even a url in a JS comment is crawled by google: stackoverflow.com
console.log("test..");
</script>


And i think it's more about "let google decide what url they serve the user" and not "let google decide what url they will crawl" in the WMT.

In case of a faceted navigation you have to be careful what you want to be indexed. In general it's best practice to set all options to "noindex, follow". "Follow" cause you want the Google Bot to crawl your detail pages.
samplesite.com/dresses/women.html = Index, Follow samplesite.com/dresses/women.html?size=10 = NoIndex, Follow samplesite.com/dresses/women.html?color=red = NoIndex, Follow samplesite.com/dresses/women.html?page=2 = NoIndex, Follow


If you have 5 categories and 50 products but 5k sites in the google index your site most likely will not perform well.
On the other hand, if you think your site is strong enough, you can try to open one option to get some long tail keywords like "red women dresses" to rank.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme