Mobile app version of vmapp.org
Login or Join
Harper822

: Does Google cache robots.txt? I added a robots.txt file to one of my sites a week ago, which should have prevented Googlebot from attempting to fetch certain URLs. However, this weekend I

@Harper822

Posted in: #Googlebot #GoogleCache #RobotsTxt

I added a robots.txt file to one of my sites a week ago, which should have prevented Googlebot from attempting to fetch certain URLs. However, this weekend I can see Googlebot loading those exact URLs.

Does Google cache robots.txt and, if so, should it?

10.07% popularity Vote Up Vote Down


Login to follow query

More posts by @Harper822

7 Comments

Sorted by latest first Latest Oldest Best

 

@Hamaas447

You can request its removal using Google's URL removal tool.

10% popularity Vote Up Vote Down


 

@Frith620

Persevere. I changed from robots.txt towards meta noindex,nofollow. In order to make the meta work the blocked addresses in robots.txt had to first be unblocked.

I did this brutally by deleting the robots.txt altogether (and delcaring it in google's webmaster).

The robots.txt removal process as seen in the webmaster tool (number of pages blocked) took 10 weeks to be completed, of which the bulk was only removed by google during the last 2 weeks.

10% popularity Vote Up Vote Down


 

@Bethany197

From what I can see on the user accessible cache they do, what you need to do is type the URL of your robots.txt file into a Google Search and then click the little green dropdown arrow and click 'cached' (see image below) this will give you the the latest version of that page from Googles servers.

10% popularity Vote Up Vote Down


 

@Megan663

Google's Documentation states that they will usually cache robots.txt for a day, but might use it for longer in if they get errors when trying to refresh it.


A robots.txt request is generally cached for up to one day, but may be cached longer in situations where refreshing the cached version is not possible (for example, due to timeouts or 5xx errors). The cached response may be shared by different crawlers. Google may increase or decrease the cache lifetime based on max-age Cache-Control HTTP headers.

10% popularity Vote Up Vote Down


 

@Rivera981

Yes. They say they typically update it once a day, but some have suggested they may also check it after a certain number of page hits (100?) so busier sites are checked more often.

See webmasters.stackexchange.com/a/29946 and the video that @DisgruntedGoat shared above youtube.com/watch?v=I2giR-WKUfY.

10% popularity Vote Up Vote Down


 

@Yeniel560

Yes, Google will obviously cache robots.txt to an extent - it won't download it every time it wants to look at a page. How long it caches it for, I don't know. However, if you have a long Expires header set, Googlebot may leave it much longer to check the file.

Another problem could be a misconfigured file. In the Webmaster Tools that danivovich suggests, there is a robots.txt checker. It will tell you which types of pages are blocked and which are fine.

10% popularity Vote Up Vote Down


 

@Michele947

I would strongly recommend registering your site with Google Search Console (previously Google Webmaster Tools). There is a crawler access section under site configuration that will tell you when your robots.txt was last downloaded. The tool also provides a lot of detail as to how the crawlers are seeing your site, what is blocked or not working, and where you are appearing in queries on Google.

From what I can tell, Google downloads the robots.txt often. The Google Search Console site will also let you specifically remove URLs from the index, so you can remove those ones you are now blocking.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme