Mobile app version of vmapp.org
Login or Join
Twilah146

: Serve a different robots.txt file for every site hosted in the same directory We have a global brand website project for which we are only working the LATAM portion. There is a website installation

@Twilah146

Posted in: #Google #Redirects #RobotsTxt #WebCrawlers

We have a global brand website project for which we are only working the LATAM portion. There is a website installation process here that allows to have one website installation with several ccTLDs, in order to reduce costs.

Because of this the robots.txt in domain.com/robots.txt is the same file in domain.com.ar/robots.txt.
We would like to implement custom robots.txt files for each LATAM country locale (AR, CO, CL, etc..). One solution we are thinking about is having a redirect placed at domain.com.ar/robots.txt to 301 to domain.com.ar/directory/robots.txt.
This way we could have custom robots.txt files for each country locale.


Does this make sense?
Is it possible to redirect a robots.txt file to another robots.txt file?
Any other suggestions?


Thanks in advance for any input you might have.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Twilah146

2 Comments

Sorted by latest first Latest Oldest Best

 

@Michele947

While this should work, it has a few potential drawbacks:


Every crawler has to do two HTTP requests: one to discover the redirect, and another one to actually fetch the file.
Some crawlers might not handle the 301 response for robots.txt correctly; there's nothing in the original robots.txt specification that says anything about redirects, so presumably they should be treated the same way as for ordinary web pages (i.e. followed), but there's no guarantee that all the countless robots that might want to crawl your site will get that right.

(The 1997 Internet Draft does explicitly say that "[o]n server response indicating Redirection (HTTP Status Code 3XX) a robot should follow the redirects until a resource can be found", but since that was never turned into an official standard, there's no real requirement for any crawlers to actually follow it.)


Generally, it would be better to simply configure your web server to return different content for robots.txt depending on the domain it's requested for. For example, using Apache mod_rewrite, you could internally rewrite robots.txt to a domain-specific file like this:

RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_HOST} ^(www.)?domain(.com?)?.([a-z][a-z])$
RewriteCond robots_%3.txt -f
RewriteRule ^robots.txt$ robots_%3.txt [NS]


This code, placed in an .htaccess file in the shared document root of the sites, should rewrite any requests for e.g. domain.com.ar/robots.txt to the file robots_ar.txt, provided that it exists (that's what the second RewriteCond checks). If the file does not exist, or if the host name doesn't match the regexp, the standard robots.txt file is served by default.

(The host name regexp should be flexible enough to also match URLs without the prefix, and to also accept the 2LD co. instead of com. (as in domain.co.uk) or even just a plain ccTLD after domain; if necessary, you can tweak it to accept even more cases. Note that I have not tested this code, so it could have bugs / typos.)

Another possibility would be to internally rewrite requests for robots.txt to (e.g.) a PHP script, which can then generate the content of the file dynamically based on the host name and anything else you want. With mod_rewrite, this could be accomplished simply with:

RewriteEngine On
RewriteBase /

RewriteRule ^robots.txt$ robots.php [NS]


(Writing the actual robots.php script is left as an exercise.)

10% popularity Vote Up Vote Down


 

@BetL925

I wouldn't count on all spiders being able to follow a redirect to get to a robots.txt file. See: Does Google respect a redirect header for robots.txt to a different file name?

Assuming you are hosted on an Apache server, you could use mod_rewrite from your .htaccess file to to serve the correct file for the correct domain:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.([a-z.]+)$
RewriteRule ^robots.txt /%1/robots.txt [L]


In that case your robots.txt file for your .cl domain would be in /cl/robots.txt and your .com.au robots.txt file would be in /com.au/robots.txt

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme