Mobile app version of vmapp.org
Login or Join
Si4351233

: Scheduled Sitemap Generation v Live Sitemap Generation I am working on a website that when put into production is likely to have thousands of new pages added each day and each page will need

@Si4351233

Posted in: #Sitemap

I am working on a website that when put into production is likely to have thousands of new pages added each day and each page will need to be added to the sitemap. Now I am not asking how to add to the sitemap as the code for this is already well established, what I am asking is if it is better (or if there is a difference at all) to generate the sitemap file on a scheduled run (say every 2 hours regenrate the file) or simply generate it on the fly each time it is requested.

The way I see it if it is generated on the fly it has the advantage of being the most recent sitemap with all new content even if the content was only added a minute previously, but on the flip side it has the potential to take a while to generate (the sitemap generator would have to run queries on a dozen different databases).

Conversely if I used a scheduled run there would be the advantage that the sitemap would load just as fast as any other file that same size (no generation time needed), but the disadvantage is that it would potentially be up to 3 hours late which could equate to a few hundred pages not being in it.

Now from my point of view it would seem like it mainly depends on the system resources used to generate the sitemap file and how long each generation would take. Is there any hard limits on how long Google will wait trying to download a sitemap file before the download times out, and is there any generally accepted guidance on how frequently the sitemap should be updated.

I know the SO sitemap is a very large one with a massive number of records, this is very well documented in Meta, but I can't see anywhere that indicates if this is generated in real time or as a batch job.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Si4351233

1 Comments

Sorted by latest first Latest Oldest Best

 

@Si4351233

Have worked this out...

A check of DeepCrawl shows that they did some testing some time in the past and found that when crawling sitemap files Google does not have a huge issue with sitemaps taking a while to generate however sitemaps which took over a minute where hit or miss on if Google would accept them or not and nothing over 200 seconds was ever indexed. DeepCrawl has suggested out of their testing that sitemaps can be generated in real time but that webmasters should aim for a sitemap generation time of no more than 60 seconds to ensure that Google will accept the sitemap.

A further check has shown that through testing the code even if every single sitemap file was requested at the same time and generated in real time at the same time and was pulling a million rows from the database the generation time would only be around 30 seconds, and I believe this could be further optimised in time, so live generation would seem to be a good fit here.

I should note that this may not be the right way for everyone and whether you use scheduled generation or live generation of your sitemaps depends on your specific use case, the underpinning technology, and the criticality of having the most up to date sitemap possible.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme