Mobile app version of vmapp.org
Login or Join
Becky754

: Allowing Google to index images on s3 We host our sites' static content on S3. We also have a very open robots.txt; User-agent: * Allow: * This is because in Webmaster tools I get thousands

@Becky754

Posted in: #AmazonS3 #RobotsTxt #Sitemap

We host our sites' static content on S3. We also have a very open robots.txt;

User-agent: *
Allow: *


This is because in Webmaster tools I get thousands of warnings Sitemap contains urls which are blocked by robots.txt.

The images are listed in my sitemap along with a content item and use the correct path; mybucket.s3.amazon.com/image/path.jpg.

Can I add a remote URL in my robots.txt? I'm assuming, with such a liberal robots.txt on s3, that this is a restriction in my site's robots.txt.

Has anybody else stored images on s3 and put them in a sitemap?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Becky754

1 Comments

Sorted by latest first Latest Oldest Best

 

@Holmes151

Allow: * should actually be be Allow: /.

That could be the issue... because, really, using Allow: is somewhat meaningless by itself -- its purpose is for allowing a sub-path within a denied path. As it stands, it seems possible that your file is being misinterpreted.

Anything not denied is supposed to be implicitly allowed.

If you want to allow everything, you should instead use Disallow: with nothing after it... or of course you could just delete your /robots.txt file entirely, since a 4xx error should be interpreted by a crawler as "no restrictions here -- have fun!"

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme