Mobile app version of vmapp.org
Login or Join
Pope3001725

: Sitemap.xml on separate server (can't use robots.txt) We have several domains managed by the same code base. They share content and assets between each other but are separate entities online.

@Pope3001725

Posted in: #Sitemap #XmlSitemap

We have several domains managed by the same code base. They share content and assets between each other but are separate entities online.

I have started generating sitemaps for these sites and hosting them on S3, this works and is a sane method for us as we have no shared file system as such.

But, I obviously cannot submit the sitemap through the UI as it requires the sitemap be on the same domain, which it isnt.

Another option is to use robots.txt to specify your sitemap (as discussed here; Can I host a sitemap on another domain?) - but as all our sites share the same docroot, we can't do that either.

Maybe we could do some fancy URL rewriting to map the sitemaps? I'm not sure what the best option is for us. Any help is much appreciated.

EDIT
I'm trying to leave the sitemap.xml.gz files on s3 but serve the xml index using php on the sites. So you can visit /xml/site1.xml and get the sitemap index which points to s3. I'll report back with how it goes.

EDIT 2
I have added an xml doc on each domain with the index of sitemaps. This is at /xmlsitemap/sitemap.domain.xml.
The index contains the externally hosted sitemap (on s3).
I then added 7 x 'sitemap:' entries into robots.txt - so this should validate the external sitemaps. It doesn't, all sitemap indexes are returning errors that the sitemap is blocked by robots.txt.

The robots.txt is not blocking the sitemap, but it keeps reporting that it is.

Any suggestions?

EDIT 3
Some links ...
[Removed]

EDIT 4 & RESOLUTION
So, to resolve this issue I am generating the xml (both sitemap with delta and index) and storing it in the db. Then i'm just serving the generated xml using nginx/php through registered paths. As we're behind Varnish we can cache the output for 24h or so. Its working well so far.

I also noticed that my s3 bucket did not have a robots.txt so the robots message may have been caused by that and not the local one.

So, resolved by using none of my original options.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Pope3001725

1 Comments

Sorted by latest first Latest Oldest Best

 

@LarsenBagley505

You can easily achieve this using the same doc_root without needing to worry about uploading the sitemap to the S3 bucket.

Using the following rewrite code

RewriteEngine On
RewriteCond %{HTTP_HOST} ^(domain1.com)$
RewriteRule ^sitemap.xml$ /domain1-sitemap.xml [L]
RewriteCond %{HTTP_HOST} ^(domain2.com)$
RewriteRule ^sitemap.xml$ /domain2-sitemap.xml [L]
RewriteCond %{HTTP_HOST} ^(domain3.com)$
RewriteRule ^sitemap.xml$ /domain3-sitemap.xml [L]


You place this code in a single .htaccess file in the site root and generate your sitemap files using the naming convention domain-sitemap.xml and the rewrite rules will rewrite sitemap.xml to the appropriate domain to suit the host header.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme