: How to test robots.txt in googlebot to find out what is being indexed This question is a continuation for this answer https://stackoverflow.com/questions/2788528/how-to-check-if-googlebot-will-index-a-given-url/2788735#2788735
This question is a continuation for this answer stackoverflow.com/questions/2788528/how-to-check-if-googlebot-will-index-a-given-url/2788735#2788735 As was told I did go to the Webmaster Tools and tested contents of my robots.txt file. However this is just giving me the info if that content is good enough or not. However for my scenario I need to test whether disallowing some patterns is being indexed or not. For example I have something like this below in my robots.txt
disallow:/pattern*
My understanding is the URLs with word pattern should not crawled, but how do I test this pattern is enforced while indexing the website?
More posts by @Gloria169
1 Comments
Sorted by latest first Latest Oldest Best
There are a couple of things that might help.
One is to look in Google's Webmaster tools under Diagnostics...Crawl Errors and click on the link Restricted by robots.txt. This helps to determine that what you enter in robots.txt is actually blocking what you expect.
The other thing to check is Google's index. I'll do a search query of the form:
site:yourdomain.com inurl:url text to check
When I've made robots.txt changes with wildcards I tend to worry about matching more than what I want, so I use the above checks to make sure only what I want excluded is actually excluded from the index.
I'm not sure if this answers your question, but hopefully it is close enough :-)
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.