: Robots: block /lang/page from the index but keep /page My URL structure looks a bit like this: /group1/ /group2/ /group3/ Group3 (and it's subpages) are also available in another language at
My URL structure looks a bit like this:
/group1/
/group2/
/group3/
Group3 (and it's subpages) are also available in another language at the following URLs. A lang prefix was added to this URL:
/lang/group3/
Unfortunately, a CMS upgrade has caused the following URLs to now considered valid, even though they have not been translated.
/lang/group1
/lang/group2
I would like to allow /group3 and /lang/group3 but block /lang/group* from the index.
What is the correct robots syntax to do this?
More posts by @Samaraweera270
1 Comments
Sorted by latest first Latest Oldest Best
If you literally only have a few "groups" you want to block then you would do something like:
User-agent: *
Disallow: /lang/group1
Disallow: /lang/group2
...and everything else would be allowed. This would work with all robots that obey the original "standard". Or, you could block all groups (group1, group2, etc.) and make an exception for "group3", like:
User-agent: *
Disallow: /lang/group
Allow: /lang/group3
Note that the Allow directive is not part of the original "standard", but has universal support. The URL path is simply a prefix.
HOWEVER, I wouldn't use robots.txt to block the pages being "crawled". What about stray visitors? And bad bots? And robots.txt doesn't prevent pages from being indexed if they are inadvertently linked to. I would use .htaccess or your server config to actually block all traffic to these URLs. Something like the following in .htaccess:
RewriteEngine On
RewriteRule ^lang/group[12] - [R=404]
To respond with a 404 for all requests to these invalid URLs. Or, use the F flag to respond with a 403 "Forbidden".
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.