: Allowing Google to index images on s3 We host our sites' static content on S3. We also have a very open robots.txt; User-agent: * Allow: * This is because in Webmaster tools I get thousands
We host our sites' static content on S3. We also have a very open robots.txt;
User-agent: *
Allow: *
This is because in Webmaster tools I get thousands of warnings Sitemap contains urls which are blocked by robots.txt.
The images are listed in my sitemap along with a content item and use the correct path; mybucket.s3.amazon.com/image/path.jpg.
Can I add a remote URL in my robots.txt? I'm assuming, with such a liberal robots.txt on s3, that this is a restriction in my site's robots.txt.
Has anybody else stored images on s3 and put them in a sitemap?
More posts by @Becky754
1 Comments
Sorted by latest first Latest Oldest Best
Allow: * should actually be be Allow: /.
That could be the issue... because, really, using Allow: is somewhat meaningless by itself -- its purpose is for allowing a sub-path within a denied path. As it stands, it seems possible that your file is being misinterpreted.
Anything not denied is supposed to be implicitly allowed.
If you want to allow everything, you should instead use Disallow: with nothing after it... or of course you could just delete your /robots.txt file entirely, since a 4xx error should be interpreted by a crawler as "no restrictions here -- have fun!"
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.