Mobile app version of vmapp.org
Login or Join
Murphy175

: How to spread XML Sitemaps over several webservers behind AWS loadbalancer? We have a web portal with almost a million products and way more other urls. I wrote a script that checks database.

@Murphy175

Posted in: #AmazonAws #CloudHosting #XmlSitemap

We have a web portal with almost a million products and way more other urls.
I wrote a script that checks database. If there is a new url needed or an old one update, this script will update/create the XML Sitemaps.
But we have several servers behind the load balancer at our rented AWS space.

Further this script checks database for each url if there was an update so that it updates the appropriate xml file too.

My question is how to spread those XML Sitemaps over all webservers behind this AWS load balancer?

Our approaches/ideas:


we could just generate them on one server with a cron job and copy them to the other servers, but this could be difficult because of automatic raising numbers of servers and so on.
we put them on our S3 - but this one is not avaible thru our domain, so I guess google will have a problem with it
I let my script run on every webserver but change it in a way that it will generate each time all xml files if they do not exist. But then I would have conflicts with updated URLs in my database, where I saved timestamp of last changed value of every url


Is there another - better - solution that I do not know? Are there any special services by amazon for such cases?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Murphy175

1 Comments

Sorted by latest first Latest Oldest Best

 

@Hamaas447

We decided to save all xml files zipped to database and get them from there as soon as a bot asks for a specific XML Sitemap.

Here are the theoretical steps:


Create XML files with 45.000 urls. (45k according to this post)
Save every url to database with information in which XML Sitemap it is in.
gzip those files (8mb => 400-600kb)
save them to a xml file table in database (use medium blob, because files are zipped between 400-600kb large)
store index xml sitemap in this table too
when google requests a file, get its content from database and set headers like

header('Content-Type: application/x-gzip');
header('Content-Encoding: gzip');
header('Content-Disposition: attachment; filename="' . $xmlFile['name'] . '"');
echo $xmlFile['content']; //source: database ?>


Beside this I've scripts that keep those xml files up to date and create new files or add new urls to existing xml files (with space).

So if anything changes my xml files will be up to date and saved to database, so that every webserver behind the loadbalancer can provide up to date XML Sitemaps.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme