: Using robots.txt to block sessionID URLs I've got a problem with Googlebot literally crawling millions of URLs that are the same, because ";jsessionid" keeps getting inserted into the URL (something
I've got a problem with Googlebot literally crawling millions of URLs that are the same, because ";jsessionid" keeps getting inserted into the URL (something I cannot change due to the work environment).
An example URL is:
catalog/product-category/product/;jsessionid=Mf87s+Xw2P8ByQYz2CyQjEJh.prod-14?f=p%3A100-200
Can I update my robots.txt to say:
Disallow: /;jsessionid=*
Does anyone see an issue with doing that? I can also canonicalize the pages, but I feel that using robots is a better solution so Googlebot won't have to waste any resources crawling the URLs in the first place.
More posts by @Megan663
1 Comments
Sorted by latest first Latest Oldest Best
you should exclude such urls from crawling, because you spend your whole crawling budget for those useless urls. Read this topic at Google Product Forum.
set the robots rule like:
disallow: /*jsessionid*
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.