Mobile app version of vmapp.org
Login or Join
Cofer257

: How does robots.txt work with sites in subfolders? I have a single web host with a number of other parked domains/sites in sub-directories, like this: example.com is the primary site and root

@Cofer257

Posted in: #AccessControl #RobotsTxt #SearchEngines #Seo #WebCrawlers

I have a single web host with a number of other parked domains/sites in sub-directories, like this:

example.com is the primary site and root directory of the web hosting.

example.com/www.example2.com is one of the parked sites, but it is just a subfolder of the primary site.

Both example2.com and example.com/www.example2.com are accessible as the same content, but I want to block access the later while allowing access to the former.

Will a robots.txt file in the primary site disallowing * allow example2.com to be crawled?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Cofer257

2 Comments

Sorted by latest first Latest Oldest Best

 

@Carla537

I guess what you are looking for is a robots.txt entry like this:

User-agent: *
Disallow: /www.example2.com


Let's suggestion you have more than 100 'parked' exampleNR.com URLs, but don't want to write a line for every single one of them...use this:

User-agent: *
Disallow: /www.example


The problem is, it actually is not officially supported, but many robots like Googlebot are able to understand those easy wildcards. RegEx are definitely not supported. for additonal information

UPDATE

Deleted the trailing asterisk since robots.txt uses simple prefix matching anyway. Thanks for your attention, w3dk

10% popularity Vote Up Vote Down


 

@Reiling115

From what you are saying if everything is redirected to example.com then yes.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme