Mobile app version of vmapp.org
Login or Join
Sims2060225

: How can I use robots.txt to disallow subdomain only? My code base is shared between several environments (live, staging, dev) & sub-domains (staging.example, dev.example, etc.) and only two

@Sims2060225

Posted in: #Domains #MultiSubdomains #RobotsTxt #Subdomain

My code base is shared between several environments (live, staging, dev) & sub-domains (staging.example, dev.example, etc.) and only two should be allowed to be crawled (ie. example and example). Normally I'd modify /robots.txt and add Disallow: /, but due to shared code base I cannot modify /robots.txt without affecting all (sub)domains.

Any ideas how to go about it?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Sims2060225

3 Comments

Sorted by latest first Latest Oldest Best

 

@Murray155

I'd remove the meta tag from the html page and dynamically build it depending on your subdomain. e.g. we use subdomain dev. for development. So in the pageload event we have this:

' Check if domain is DEV or PROD and set robots accordingly meta tag in head
Dim metatag As HtmlMeta = New HtmlMeta
metatag.Attributes.Add("name", "robots")
If CurrentURL.Contains("dev.advertise-it") Then
metatag.Attributes.Add("content", "NOINDEX, NOFOLLOW")
Else
metatag.Attributes.Add("content", "INDEX, FOLLOW")
End If
Page.Header.Controls.Add(metatag)

10% popularity Vote Up Vote Down


 

@Shanna517

robots.txt works only if it is present in the root.

You need to upload a separate robots.txt for each subdomain website, where it can be accessed from subdomain.example.com/robots.txt.
Add the code below in to robots.txt

User-agent: *
Disallow: /


And another way is you can insert a Robots <META> tag in all pages.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

10% popularity Vote Up Vote Down


 

@Ann8826881

You can serve a different robots.txt file based on the subdomain through which the site has been accessed. One way of doing this on Apache is by internally rewriting the URL using mod_rewrite in .htaccess. Something like:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www.)?example.com$ [NC]
RewriteRule ^robots.txt$ robots-disallow.txt [L]


The above states that for all requests to robots.txt where the host is anything other than example.com or example.com, then internally rewrite the request to robots-disallow.txt. And robots-disallow.txt will then contain the Disallow: / directive.

If you have other directives in your .htaccess file then this directive will need to be nearer the top, before any routing directives.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme