Mobile app version of vmapp.org
Login or Join
Gloria169

: How to test robots.txt in googlebot to find out what is being indexed This question is a continuation for this answer https://stackoverflow.com/questions/2788528/how-to-check-if-googlebot-will-index-a-given-url/2788735#2788735

@Gloria169

Posted in: #Google #Indexing #Seo

This question is a continuation for this answer stackoverflow.com/questions/2788528/how-to-check-if-googlebot-will-index-a-given-url/2788735#2788735 As was told I did go to the Webmaster Tools and tested contents of my robots.txt file. However this is just giving me the info if that content is good enough or not. However for my scenario I need to test whether disallowing some patterns is being indexed or not. For example I have something like this below in my robots.txt

disallow:/pattern*


My understanding is the URLs with word pattern should not crawled, but how do I test this pattern is enforced while indexing the website?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Gloria169

1 Comments

Sorted by latest first Latest Oldest Best

 

@Becky754

There are a couple of things that might help.

One is to look in Google's Webmaster tools under Diagnostics...Crawl Errors and click on the link Restricted by robots.txt. This helps to determine that what you enter in robots.txt is actually blocking what you expect.

The other thing to check is Google's index. I'll do a search query of the form:

site:yourdomain.com inurl:url text to check


When I've made robots.txt changes with wildcards I tend to worry about matching more than what I want, so I use the above checks to make sure only what I want excluded is actually excluded from the index.

I'm not sure if this answers your question, but hopefully it is close enough :-)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme