Mobile app version of vmapp.org
Login or Join
Radia820

: Robots.txt practices with .htaccess redirections (inherits) I have a question regarding how to write robots.txt files for many domains and subdomains with redirects in place. We have a hosting

@Radia820

Posted in: #301Redirect #RobotsTxt #WebCrawlers

I have a question regarding how to write robots.txt files for many domains and subdomains with redirects in place.

We have a hosting account that enacts primary and add-on domains. All of our domains and subdomains, including the primary domain, is redirected via htaccess 301s to their own subdirectories in the primary domain's root directory.

I'm confused about how I would write the robots.txt for certain directories. First, I wanted to confirm I am right in understanding that for domains and subdomains, crawlers will look to the directory that acts as that urls root directory for the crawling rules(robots.txt). Also, that a directory will not be affected by a robots.txt present in their parent directory if the directory has its own domain/subdomain, and that url is the one being accessed by crawlers. (Am pretty sure, but I wanted to confirm I didnt have a fundamentally flawed understanding of robots.txt)

In the original root directory on the account(where the primary domain was directed before htaccess was put in place) what should the robots.txt contain? When crawlers look to crawl our primary domain, will they look to the original root directory for the robots.txt or will they reference the file contained in the new subdirectory where all the primary domain's site files are located? If so, what should the root's robot.txt include if anything at all.

Would I be right to include a simple 'disallow: /' for all agents, and then include more specific robots.txt files in each subdirectory with more specific instructions. Would that affect the crawling of the directory where the primary domain is now redirected?

Any help is greatly appreciated, Thanks!

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Radia820

1 Comments

Sorted by latest first Latest Oldest Best

 

@Goswami781

The first thing to say is that the crawlers don't know your file structure. They just ask for domain.com/robots.txt or sub.domain.com/robots.txt and get given the file that your server is configured to return.

So to answer


When crawlers look to crawl our primary domain, will they look to the
original root directory for the robots.txt or will they reference the
file contained in the new subdirectory where all the primary domain's
site files are located?


they'll look in the new subdirectory.

So you don't need a robots.txt in your file system root as the crawlers don't have access to it.

However to check I'm understanding your situation properly it would be useful to know the htaccess rule you use to redirect the primary domain.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme