: The Sitemap Paradox We use a sitemap on Stack Overflow, but I have mixed feelings about it. Web crawlers usually discover pages from links within the site and from other sites. Sitemaps

Posted in: #Google #GoogleSearch #GoogleSearchConsole #Sitemap #XmlSitemap

We use a sitemap on Stack Overflow, but I have mixed feelings about it.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

Based on our two years' experience with sitemaps, there's something fundamentally paradoxical about the sitemap:

Sitemaps are intended for sites that are hard to crawl properly.
If Google can't successfully crawl your site to find a link, but is able to find it in the sitemap it gives the sitemap link no weight and will not index it!

That's the sitemap paradox -- if your site isn't being properly crawled (for whatever reason), using a sitemap will not help you!

Google goes out of their way to make no sitemap guarantees:

"We cannot make any predictions or guarantees about when or if your URLs will be crawled or added to our index" citation

"We don't guarantee that we'll crawl or index all of your URLs. For example, we won't crawl or index image URLs contained in your Sitemap." citation

"submitting a Sitemap doesn't guarantee that all pages of your site will be crawled or included in our search results" citation

Given that links found in sitemaps are merely recommendations, whereas links found on your own website proper are considered canonical ... it seems the only logical thing to do is avoid having a sitemap and make damn sure that Google and any other search engine can properly spider your site using the plain old standard web pages everyone else sees.

By the time you have done that, and are getting spidered nice and thoroughly so Google can see that your own site links to these pages, and would be willing to crawl the links -- uh, why do we need a sitemap, again? The sitemap can be actively harmful, because it distracts you from ensuring that search engine spiders are able to successfully crawl your whole site. "Oh, it doesn't matter if the crawler can see it, we'll just slap those links in the sitemap!" Reality is quite the opposite in our experience.

That seems more than a little ironic considering sitemaps were intended for sites that have a very deep collection of links or complex UI that may be hard to spider. In our experience, the sitemap does not help, because if Google can't find the link on your site proper, it won't index it from the sitemap anyway. We've seen this proven time and time again with Stack Overflow questions.

Am I wrong? Do sitemaps make sense, and we're somehow just using them incorrectly?

10.19% popularity Vote Up Vote Down

: (sorry I can't use comments yet) I last used red5 over a year ago, setting it up on a media temple dedicated server and developed a few applications with it. Generally the feel for red5

@Voss4911412

0 Comments

: Amazon Web Services (AWS) is merely infrastructure. You could spin up a new LAMP (Linux, Apache, Mysql, PHP) virtual machine on EC2 and from there configure everything else yourself. http://aws.amazon.com/

@Voss4911412

0 Comments

: Page appears indexed in Google but not findable for any search terms? (Note that I am going to use screenshots here because I suspect writing about this will change the behavior over time.)

@Voss4911412

Posted in: #CreativeCommons #Google #Indexing

4 Comments

: How do I allow visitors on my site to share my photos on their facebook news feed? How do I allow visitors on my site to share my photos, on their Facebook wall/news feed? I see that there

@Voss4911412

Posted in: #AspNet #Facebook #Html #SocialMedia #WebDevelopment

1 Comments

Login to post a comment!

16 Comments

Sorted by latest first Latest Oldest Best

@Ann8826881

I recently restructured a site that I am still working on. Because there was no good way I could see to link 500,000 pages to help users, I decided to use an XML sitemap and submit it to Google and use site search instead. Google had no problem indexing my site earlier, however, since adding the sitemap, Google is very aggressive in spidering my site and indexing the pages extremely fast. Google has used the sitemap to find new pages (about 3300 per week) and revisit updated pages. It has been a real win in my book. I still want to figure out a new way to link my pages and use AJAX for look-up, but that is a project for another day. So far, so good! It has been a good solution for me. All and all, I have gained and not lost. Which is interesting since I have always felt that sitemaps could actually be more useful but limited by its design.

: The Sitemap Paradox We use a sitemap on Stack Overflow, but I have mixed feelings about it. Web crawlers usually discover pages from links within the site and from other sites. Sitemaps

More posts by @Voss4911412

: (sorry I can't use comments yet) I last used red5 over a year ago, setting it up on a media temple dedicated server and developed a few applications with it. Generally the feel for red5

: Amazon Web Services (AWS) is merely infrastructure. You could spin up a new LAMP (Linux, Apache, Mysql, PHP) virtual machine on EC2 and from there configure everything else yourself. http://aws.amazon.com/

: Page appears indexed in Google but not findable for any search terms? (Note that I am going to use screenshots here because I suspect writing about this will change the behavior over time.)

: How do I allow visitors on my site to share my photos on their facebook news feed? How do I allow visitors on my site to share my photos, on their Facebook wall/news feed? I see that there

Login to post a comment!

16 Comments

Back to top | Use Dark Theme