: Allowing Google to index images on s3 We host our sites' static content on S3. We also have a very open robots.txt; User-agent: * Allow: * This is because in Webmaster tools I get thousands

Posted in: #AmazonS3 #RobotsTxt #Sitemap

We host our sites' static content on S3. We also have a very open robots.txt;

User-agent: *
Allow: *

This is because in Webmaster tools I get thousands of warnings Sitemap contains urls which are blocked by robots.txt.

The images are listed in my sitemap along with a content item and use the correct path; mybucket.s3.amazon.com/image/path.jpg.

Can I add a remote URL in my robots.txt? I'm assuming, with such a liberal robots.txt on s3, that this is a restriction in my site's robots.txt.

Has anybody else stored images on s3 and put them in a sitemap?

10.01% popularity Vote Up Vote Down

: Using 'geo.??' meta tags to geo-tag a blog post I am developing a blog post website, where each post will be about some location. Can I use geo meta tags to tag a post with that location?

@Becky754

Posted in: #Geotagging #Html #MetaTags

1 Comments

: What should be domain name A-record for VPS server's main domain? While updating the child name servers of the main domain of my VPS server I mistakenly changed the A-records to the server's

@Becky754

Posted in: #Dns #DnsLookup #DnsServers #Vps

1 Comments

: Moving eCommerce site to new domain. Should I redirect all product pages to new location? I'm moving an ecommerce store from a subdomain using CubeCart to the root directory using WooCommerce.

@Becky754

Posted in: #Ecommerce #Woocommerce

1 Comments

: How is EC2 instance utilization computed? I'm currently on free tier but I'm thinking about moving into a t2.small instance to be able to do more. At this point, I'm just really interested

@Becky754

Posted in: #Amazon #AmazonAws #AmazonEc2

1 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Holmes151

Allow: * should actually be be Allow: /.

That could be the issue... because, really, using Allow: is somewhat meaningless by itself -- its purpose is for allowing a sub-path within a denied path. As it stands, it seems possible that your file is being misinterpreted.

Anything not denied is supposed to be implicitly allowed.

If you want to allow everything, you should instead use Disallow: with nothing after it... or of course you could just delete your /robots.txt file entirely, since a 4xx error should be interpreted by a crawler as "no restrictions here -- have fun!"

10% popularity Vote Up Vote Down

Feed

: Allowing Google to index images on s3 We host our sites' static content on S3. We also have a very open robots.txt; User-agent: * Allow: * This is because in Webmaster tools I get thousands

More posts by @Becky754

: Using 'geo.??' meta tags to geo-tag a blog post I am developing a blog post website, where each post will be about some location. Can I use geo meta tags to tag a post with that location?

: What should be domain name A-record for VPS server's main domain? While updating the child name servers of the main domain of my VPS server I mistakenly changed the A-records to the server's

: Moving eCommerce site to new domain. Should I redirect all product pages to new location? I'm moving an ecommerce store from a subdomain using CubeCart to the root directory using WooCommerce.

: How is EC2 instance utilization computed? I'm currently on free tier but I'm thinking about moving into a t2.small instance to be able to do more. At this point, I'm just really interested

Login to post a comment!

1 Comments

Back to top | Use Dark Theme