Mobile app version of vmapp.org
Login or Join
Odierno851

: Google-Image bot not respecting robots.txt I have the following in my robots.txt file: # Block Googlebot-Image from cache User-agent: Googlebot-Image Disallow: /media/catalog/product/cache/ # Crawlers

@Odierno851

Posted in: #Googlebot #RobotsTxt

I have the following in my robots.txt file:

# Block Googlebot-Image from cache
User-agent: Googlebot-Image
Disallow: /media/catalog/product/cache/

# Crawlers Setup
User-agent: *

# Directories
Disallow: /404/
Disallow: /app/

Disallow: /cgi-bin/
Disallow: /downloader/

Disallow: /errors/
...


But I am still seeing this in my logs:

66.249.66.189 - - [23/Jul/2016:06:30:12 -0700] "GET /media/catalog/product/cache/1/image/1800x/040ec09b1e35df139433887a97daa66f/n/d/nd01120a10g.jpg HTTP/1.1" 200 106745 "-" "Googlebot-Image/1.0"


Am I doing something wrong?

The thing is, I don't want those cached images indexed because they get deleted and I don't want 404 penalties.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Odierno851

1 Comments

Sorted by latest first Latest Oldest Best

 

@Debbie626

I've noticed that Google does the same thing to some of my sites, but they don't actually index the content they've crawled.

Read here: support.google.com/webmasters/answer/35308?hl=en
According to Google, using that in the robots.txt prevents them from indexing the content, it doesn't say that it will stop crawling it entirely.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme