: How to tell search engines to not index entire image domain without making them waste server bandwidth or making google complain From what I learned, there's one way I could cause all URLs
From what I learned, there's one way I could cause all URLs on a domain strictly serving images not to be indexed and that is with the x-robots-tag HTTP header. Now I check my logs and find out that google and even baidu are downloading the entire contents of the image URLs. I was hoping they'd stop downloading when they came across this line:
X-Robots-Tag: noindex, noimageindex
Either I formatted that line wrong (used wrong casing or wrong order of values or something), or search engines are just plain dumb and they just decide to download everything just to waste customer's money.
I looked into robots.txt and thought of using the noindex line but when I did, google complained about having no access to what they call an "important url" when it isn't important.
I don't want to block their IP's because I have text-based content on another domain running on the same server that I want them to index.
I'm tempted to offer search engines the equivalent of what users get if they requested the URL via the HEAD method (full headers but no actual content), but I might get penalized for content cloaking.
Is there something I can do to rectify this?
More posts by @Annie201
2 Comments
Sorted by latest first Latest Oldest Best
Google supports Noindex: in robots.txt. See How does “Noindex:” in robots.txt work? It is a beta feature though and they may remove support for it. Because of that I would use the robots.txt file:
User-Agent: *
Disallow: /
User-Agent: Googlebot
Noindex: /
User-Agent: bingbot
Disallow:
User-agent: Yahoo! Slurp
Disallow:
User-agent: Yandex
Disallow:
Along with the heading you mention in your question:
X-Robots-Tag: noindex, noimageindex
In that case, only three spiders will crawl your content to find out they can't index it. Googlebot won't crawl or index. Non-search-engine bots won't even be allowed to crawl at all.
If Googlebot does stop supporting Noindex: it will start crawling and find out that it can't index.
The most effective way to do this is to use a robots.txt file with Disallow: / as the only directive and place it in the web root for the images domain. When this is done search engines won't crawl the images. The reason why you got the error from Google was only because it was a computer based evaluation that deemed that the images may have been needed to be crawled but it is at your discretion. As you don't want the images to be indexed you can safely ignore this error from Google as it indicates that the images are not going to be crawled which is what you want.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.