Mobile app version of vmapp.org
Login or Join
Pierce454

: Is cloaking of sitemap.xml permitted? I have a scenario where my sitemaps are generated when visiting /sitemap.xml due to the contents being dynamic in nature. For that reason I'd like to not

@Pierce454

Posted in: #Bing #Google #Sitemap

I have a scenario where my sitemaps are generated when visiting /sitemap.xml due to the contents being dynamic in nature. For that reason I'd like to not call the services that are used to create my sitemap when a user visits but it should generate contents when Google or Bingbot visits the link. I have code to detect the bots. I'm concerned if this is considered as bad according to Google or Bing.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Pierce454

2 Comments

Sorted by latest first Latest Oldest Best

 

@BetL925

I wouldn't worry about "cloaking" on files that are meant only for the consumption of robots. Showing a 403 Forbidden status when the user-agent isn't Googlebot should be fine on a sitemap file. Google cares about cloaking when users see different results than Googlebot. In this case, Google is never going to refer users to the sitemap at all.

I often serve different robots.txt files to different crawlers based on user-agent. That is another case in which I just don't worry about cloaking.



Edit: I had problems actually having my sitemap file show up in Google's search results sometimes.

If you are going to cloak your sitemap file. You want to tell google never to index the file. You do so by putting an X-Robots-Tag header in place for your sitemap files.

<Files ~ "sitemap.*.xml(.gz)?$">
Header append X-Robots-Tag "noindex"
</Files>

10% popularity Vote Up Vote Down


 

@Nimeshi995

If I read your question right is that basically you do not want users or any other bots than Bing and Google having the ability to visit sitemap.xml as the contents of the sitemap are generated real time which could cause additional server load that you want to avoid.

If I'm correct you are approaching this in the wrong manner and there should be no reason resulting in such user agent detecting and serving different results as Google may think this is cloaking if they did an audit on your site using a standard user agent (Doubtful, but ya).

A Possible Solution

Within Google and Bing Webmaster Tools it gives you the option to add a sitemap, this sitemap can be named anything you want... So if you name it top-sercet-sitemap.xml then bots that generally just check to see if /sitemap.xml exists will receive a good old 404 error now the problem that arises is that Google is known for indexing sitemaps, now depending on your server software this can be prevented using Header Tags...

For example in Apache you can use Header set X-Robots-Tag to inject noindex into the header response of the sitemap - this means that Google will crawl the sitemap but won't index the sitemap itself, many people confuse noindex with not having the ability to crawl the page when this isn't true, only robots denied will do that.

For Apache:

<IfModule mod_rewrite.c>
<Files ~ "^(top-sercet-sitemap1|top-sercet-sitemap2|top-sercet-sitemap3).xml$">
Header set X-Robots-Tag "noindex"
</Files>
</IfModule>

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme