: How does robots.txt work with sites in subfolders? I have a single web host with a number of other parked domains/sites in sub-directories, like this: example.com is the primary site and root

Posted in: #AccessControl #RobotsTxt #SearchEngines #Seo #WebCrawlers

I have a single web host with a number of other parked domains/sites in sub-directories, like this:

example.com is the primary site and root directory of the web hosting.

example.com/www.example2.com is one of the parked sites, but it is just a subfolder of the primary site.

Both example2.com and example.com/www.example2.com are accessible as the same content, but I want to block access the later while allowing access to the former.

Will a robots.txt file in the primary site disallowing * allow example2.com to be crawled?

10.02% popularity Vote Up Vote Down

: How do I disregard traffic where the visitor has done a specific action in Google Analytics? Essentially, I want all visitors that follow this specific "funnel" to be ignored in the stats:

@Cofer257

Posted in: #Analytics #GoogleAnalytics #UniversalAnalytics

1 Comments

: How to delete a long list of sites from Webmaster Tools? I have 8 properties set for deletion or pending. I unlinked Google Analytics, checked all the sites and always get a 404 response.

@Cofer257

Posted in: #Analytics #GoogleAnalytics #GoogleSearch #GoogleSearchConsole

1 Comments

: Ghost not found 404 pages keep popping up in Google Search Console We've been scratching our head with 404s that continually pop-up on Google Search Console. These are very old links (4+ years),

@Cofer257

Posted in: #GoogleSearchConsole #Seo

2 Comments

: SEO impact of using special letters in a URL I am in Germany and we have many special letters like äßö. Wikipedia uses the letters as is in the URL, but I have heard that only US-ASCII

@Cofer257

Posted in: #Seo #Url

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Carla537

I guess what you are looking for is a robots.txt entry like this:

User-agent: *
Disallow: /www.example2.com

Let's suggestion you have more than 100 'parked' exampleNR.com URLs, but don't want to write a line for every single one of them...use this:

User-agent: *
Disallow: /www.example

The problem is, it actually is not officially supported, but many robots like Googlebot are able to understand those easy wildcards. RegEx are definitely not supported. for additonal information

UPDATE

Deleted the trailing asterisk since robots.txt uses simple prefix matching anyway. Thanks for your attention, w3dk

10% popularity Vote Up Vote Down

@Reiling115

From what you are saying if everything is redirected to example.com then yes.

10% popularity Vote Up Vote Down

Feed

: How does robots.txt work with sites in subfolders? I have a single web host with a number of other parked domains/sites in sub-directories, like this: example.com is the primary site and root

More posts by @Cofer257

: How do I disregard traffic where the visitor has done a specific action in Google Analytics? Essentially, I want all visitors that follow this specific "funnel" to be ignored in the stats:

: How to delete a long list of sites from Webmaster Tools? I have 8 properties set for deletion or pending. I unlinked Google Analytics, checked all the sites and always get a 404 response.

: Ghost not found 404 pages keep popping up in Google Search Console We've been scratching our head with 404s that continually pop-up on Google Search Console. These are very old links (4+ years),

: SEO impact of using special letters in a URL I am in Germany and we have many special letters like äßö. Wikipedia uses the letters as is in the URL, but I have heard that only US-ASCII

Login to post a comment!

2 Comments

Back to top | Use Dark Theme