: Google-Image bot not respecting robots.txt I have the following in my robots.txt file: # Block Googlebot-Image from cache User-agent: Googlebot-Image Disallow: /media/catalog/product/cache/ # Crawlers
I have the following in my robots.txt file:
# Block Googlebot-Image from cache
User-agent: Googlebot-Image
Disallow: /media/catalog/product/cache/
# Crawlers Setup
User-agent: *
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
...
But I am still seeing this in my logs:
66.249.66.189 - - [23/Jul/2016:06:30:12 -0700] "GET /media/catalog/product/cache/1/image/1800x/040ec09b1e35df139433887a97daa66f/n/d/nd01120a10g.jpg HTTP/1.1" 200 106745 "-" "Googlebot-Image/1.0"
Am I doing something wrong?
The thing is, I don't want those cached images indexed because they get deleted and I don't want 404 penalties.
More posts by @Odierno851
1 Comments
Sorted by latest first Latest Oldest Best
I've noticed that Google does the same thing to some of my sites, but they don't actually index the content they've crawled.
Read here: support.google.com/webmasters/answer/35308?hl=en
According to Google, using that in the robots.txt prevents them from indexing the content, it doesn't say that it will stop crawling it entirely.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.