: Does Google cache robots.txt? I added a robots.txt file to one of my sites a week ago, which should have prevented Googlebot from attempting to fetch certain URLs. However, this weekend I
I added a robots.txt file to one of my sites a week ago, which should have prevented Googlebot from attempting to fetch certain URLs. However, this weekend I can see Googlebot loading those exact URLs.
Does Google cache robots.txt and, if so, should it?
More posts by @Harper822
7 Comments
Sorted by latest first Latest Oldest Best
Persevere. I changed from robots.txt towards meta noindex,nofollow. In order to make the meta work the blocked addresses in robots.txt had to first be unblocked.
I did this brutally by deleting the robots.txt altogether (and delcaring it in google's webmaster).
The robots.txt removal process as seen in the webmaster tool (number of pages blocked) took 10 weeks to be completed, of which the bulk was only removed by google during the last 2 weeks.
From what I can see on the user accessible cache they do, what you need to do is type the URL of your robots.txt file into a Google Search and then click the little green dropdown arrow and click 'cached' (see image below) this will give you the the latest version of that page from Googles servers.
Google's Documentation states that they will usually cache robots.txt for a day, but might use it for longer in if they get errors when trying to refresh it.
A robots.txt request is generally cached for up to one day, but may be cached longer in situations where refreshing the cached version is not possible (for example, due to timeouts or 5xx errors). The cached response may be shared by different crawlers. Google may increase or decrease the cache lifetime based on max-age Cache-Control HTTP headers.
Yes. They say they typically update it once a day, but some have suggested they may also check it after a certain number of page hits (100?) so busier sites are checked more often.
See webmasters.stackexchange.com/a/29946 and the video that @DisgruntedGoat shared above youtube.com/watch?v=I2giR-WKUfY.
Yes, Google will obviously cache robots.txt to an extent - it won't download it every time it wants to look at a page. How long it caches it for, I don't know. However, if you have a long Expires header set, Googlebot may leave it much longer to check the file.
Another problem could be a misconfigured file. In the Webmaster Tools that danivovich suggests, there is a robots.txt checker. It will tell you which types of pages are blocked and which are fine.
I would strongly recommend registering your site with Google Search Console (previously Google Webmaster Tools). There is a crawler access section under site configuration that will tell you when your robots.txt was last downloaded. The tool also provides a lot of detail as to how the crawlers are seeing your site, what is blocked or not working, and where you are appearing in queries on Google.
From what I can tell, Google downloads the robots.txt often. The Google Search Console site will also let you specifically remove URLs from the index, so you can remove those ones you are now blocking.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.