: Does Google Preview obey Robots.txt? Because it sure looks like it does. For my sites we disallow the images directory and the previews are all missing images which makes the site look wonky.
Because it sure looks like it does. For my sites we disallow the images directory and the previews are all missing images which makes the site look wonky.
Is this the case and is there a way to allow just the preview bot to access images using robots.txt?
EDIT: It looks like the previews are generated both by the normal Google Bot and by an on the fly bot Google Web Preview as mentioned (briefly) on the Webmaster Central Blog.
By using a site: search and my monitoring software I could see when the bot hit my site and when this happened the images showed up just fine in the preview. So my guess is that the normal crawler ignores the images per robots.txt, but the preview crawler gets the images anyway.
This implementation seems kind of crumby because my options seem to be:
allow google bot to crawl my images (which I don't want to do)
use the nosnippet tag which blocks the preview, but ALSO snippets (which I don't want to do)
Let the wonky previews appear which may adversely affect click throughs
More posts by @Murray432
3 Comments
Sorted by latest first Latest Oldest Best
I think John Mueller had it right in the comments.
If it's just a matter of not having the images indexed, you could allow crawling but serve the images with an x-robots-tag HTTP header with "noindex"
I didn't know that you could allow Google to crawl content w/o indexing it. I put his technique in place and am just waiting to get crawled to see if it worked.
I'll accept this as the answer in a few days unless John wants to add his comments to the answer section so he can earn the rep.
The following is a technical solution that may or may not be simply to apply to your site.
It is possible (even likely) that Google will come out with a way to do this with just a few hints in meta data or robots.txt, but until then....
Step 1.
Create a redirection service/servlet for front page images.
I.e. an URL like
/frontpageimages/[image name]
that does a server side redirect to
/images/[image name]
Step 2.
Have all image links on your front page (and only front page) rewritten to go through the redirection service from step 1 rather than linking directly to the image.
Step 3.
Make sure that robots.txt allows googlebot to crawl /frontpageimages/
This should ensure that Google can crawl any images it encounters on your front page while leaving any images on other pages alone.
While the redirection service could (in theory) be used to crawl all your images without technically violating your robots.txt, it is not something that well behaved robots (like googlebot) are going to do. And ill behaved robots aren't going to worry about robots.txt.
As the most part of the preview is done by the Google bot crawler, blocking crawling of some part of your site will impact the preview...
Why don't you want to allow Google bot to crawl your images?
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.