Mobile app version of vmapp.org
Login or Join
Annie201

: How to tell search engines to not index entire image domain without making them waste server bandwidth or making google complain From what I learned, there's one way I could cause all URLs

@Annie201

Posted in: #Baidu #Download #Google #Images #Indexing

From what I learned, there's one way I could cause all URLs on a domain strictly serving images not to be indexed and that is with the x-robots-tag HTTP header. Now I check my logs and find out that google and even baidu are downloading the entire contents of the image URLs. I was hoping they'd stop downloading when they came across this line:

X-Robots-Tag: noindex, noimageindex


Either I formatted that line wrong (used wrong casing or wrong order of values or something), or search engines are just plain dumb and they just decide to download everything just to waste customer's money.

I looked into robots.txt and thought of using the noindex line but when I did, google complained about having no access to what they call an "important url" when it isn't important.

I don't want to block their IP's because I have text-based content on another domain running on the same server that I want them to index.

I'm tempted to offer search engines the equivalent of what users get if they requested the URL via the HEAD method (full headers but no actual content), but I might get penalized for content cloaking.

Is there something I can do to rectify this?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Annie201

2 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

Google supports Noindex: in robots.txt. See How does “Noindex:” in robots.txt work? It is a beta feature though and they may remove support for it. Because of that I would use the robots.txt file:

User-Agent: *
Disallow: /

User-Agent: Googlebot
Noindex: /

User-Agent: bingbot
Disallow:

User-agent: Yahoo! Slurp
Disallow:

User-agent: Yandex
Disallow:


Along with the heading you mention in your question:

X-Robots-Tag: noindex, noimageindex


In that case, only three spiders will crawl your content to find out they can't index it. Googlebot won't crawl or index. Non-search-engine bots won't even be allowed to crawl at all.

If Googlebot does stop supporting Noindex: it will start crawling and find out that it can't index.

10% popularity Vote Up Vote Down


 

@Si4351233

The most effective way to do this is to use a robots.txt file with Disallow: / as the only directive and place it in the web root for the images domain. When this is done search engines won't crawl the images. The reason why you got the error from Google was only because it was a computer based evaluation that deemed that the images may have been needed to be crawled but it is at your discretion. As you don't want the images to be indexed you can safely ignore this error from Google as it indicates that the images are not going to be crawled which is what you want.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme