: How does robots.txt work with sites in subfolders? I have a single web host with a number of other parked domains/sites in sub-directories, like this: example.com is the primary site and root
I have a single web host with a number of other parked domains/sites in sub-directories, like this:
example.com is the primary site and root directory of the web hosting.
example.com/www.example2.com is one of the parked sites, but it is just a subfolder of the primary site.
Both example2.com and example.com/www.example2.com are accessible as the same content, but I want to block access the later while allowing access to the former.
Will a robots.txt file in the primary site disallowing * allow example2.com to be crawled?
More posts by @Cofer257
2 Comments
Sorted by latest first Latest Oldest Best
I guess what you are looking for is a robots.txt entry like this:
User-agent: *
Disallow: /www.example2.com
Let's suggestion you have more than 100 'parked' exampleNR.com URLs, but don't want to write a line for every single one of them...use this:
User-agent: *
Disallow: /www.example
The problem is, it actually is not officially supported, but many robots like Googlebot are able to understand those easy wildcards. RegEx are definitely not supported. for additonal information
UPDATE
Deleted the trailing asterisk since robots.txt uses simple prefix matching anyway. Thanks for your attention, w3dk
From what you are saying if everything is redirected to example.com then yes.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.