Mobile app version of vmapp.org
Login or Join
Megan663

: Using robots.txt to block sessionID URLs I've got a problem with Googlebot literally crawling millions of URLs that are the same, because ";jsessionid" keeps getting inserted into the URL (something

@Megan663

Posted in: #RobotsTxt #Url #UrlParameters #WebCrawlers

I've got a problem with Googlebot literally crawling millions of URLs that are the same, because ";jsessionid" keeps getting inserted into the URL (something I cannot change due to the work environment).

An example URL is:

catalog/product-category/product/;jsessionid=Mf87s+Xw2P8ByQYz2CyQjEJh.prod-14?f=p%3A100-200


Can I update my robots.txt to say:

Disallow: /;jsessionid=*


Does anyone see an issue with doing that? I can also canonicalize the pages, but I feel that using robots is a better solution so Googlebot won't have to waste any resources crawling the URLs in the first place.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Megan663

1 Comments

Sorted by latest first Latest Oldest Best

 

@Eichhorn148

you should exclude such urls from crawling, because you spend your whole crawling budget for those useless urls. Read this topic at Google Product Forum.

set the robots rule like:

disallow: /*jsessionid*

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme