Mobile app version of vmapp.org
Login or Join
Reiling115

: Getting CDN images indexed with Google I have somewhere close to 500,000 user-uploaded images hosted on a Cloudfront CDN -- separate from our main host (exampledomain.com). Up until this point,

@Reiling115

Posted in: #AmazonCloudfront #Cdn #Cname #GoogleIndex #Images

I have somewhere close to 500,000 user-uploaded images hosted on a Cloudfront CDN -- separate from our main host (exampledomain.com). Up until this point, few of them have been getting indexed at the default distribution URLs. Example:
d7oxxxxxxx.cloudfront.net/images/example_directory/subdirectory/LG_example_filename.jpg
So I added a CNAME (alternate domain name) so that the URLs have now become: media.exampledomain.com/images/example_directory/subdirectory/LG_example_filename.jpg
And I added "media.exampledomain.com" as a verified domain in Google Search Console.

I also have a dynamic sitemap hosted on exampledomain.com that lists all of the images I would want to get indexed -- one image per page (there are probably close to 240,000 pages altogether). Example:

<url>
<loc>http://www.exampledomain.com/directory/pagename</loc>
<changefreq>daily</changefreq>
<image:image>
<image:loc> media.exampledomain.com/images/exampledirectory/subdirectory/LG_filname.jpg </image:loc>
<image:title>Example Image Title</image:title>
<image:caption>Example Image Caption</image:caption>
</image:image>
</url>


According to what I've read, this should get Google to start indexing all of the images. However, I do not want to potentially wait a whole week to find out that there is something else I may not have done or that something else may be blocking the images from being indexed. The Cloudfront URLs are all fully public as far as I can tell and there aren't any robots.txt restrictions in place on the CDN. I only have one Cloudfront distribution currently active so I don't believe there should be any issues with duplicate content. Is there anything else I may need to account for or some way I can see in advance if it is going to work?

Thanks for any help you can provide.

UPDATE:

I've been tracking this for a few days now. The Google bots have been crawling and indexing all of our site's pages at a nice swift rate (over 50,000 pages in a day!). However, there is still something up with the images. I see that there are over 160,000 images submitted in the sitemap and Google has crawled roughly 15,000 of them, but only 50 have actually been indexed. Does anyone have any ideas why Google may be having difficulty with these?

Here is an example format for one of URLs. There is a 12-14 digit timestamp appended to the end of all of the files:
media.exampledomain.com/images/category/id/LG_keywords_1442182082.5437.jpg

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Reiling115

1 Comments

Sorted by latest first Latest Oldest Best

 

@Lengel546

That is pretty much what I have done;


Images on CDN with CNAME record.
Verified the CDN domain in Google Webmasters.
Using CDN urls in sitemap.
Added sitemap listing in robots.txt as well as in Google Webmasters.
No robots restriction on CDN domain.


And Google is indexing my images just fine. If I search for site:mysitedomain.com on Google; all images from the CDN is shown as well :)

Update:

In the robots.txt file for my website I have:

User-agent: *
Disallow: /harming/humans
Sitemap: www.website.net/sitemap.xml

This makes sure that other search engines (not Google) also finds the sitemap. More info here: www.sitemaps.org/protocol.html#submit_robots
The robots.txt on my CDN domain simply allows crawling and looks like this:

User-agent: *
Disallow:

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme