Mobile app version of vmapp.org
Login or Join
Samaraweera270

: Robots: block /lang/page from the index but keep /page My URL structure looks a bit like this: /group1/ /group2/ /group3/ Group3 (and it's subpages) are also available in another language at

@Samaraweera270

Posted in: #RobotsTxt #Url #WebCrawlers

My URL structure looks a bit like this:


/group1/
/group2/
/group3/


Group3 (and it's subpages) are also available in another language at the following URLs. A lang prefix was added to this URL:


/lang/group3/


Unfortunately, a CMS upgrade has caused the following URLs to now considered valid, even though they have not been translated.


/lang/group1
/lang/group2


I would like to allow /group3 and /lang/group3 but block /lang/group* from the index.

What is the correct robots syntax to do this?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Samaraweera270

1 Comments

Sorted by latest first Latest Oldest Best

 

@Ogunnowo487

If you literally only have a few "groups" you want to block then you would do something like:

User-agent: *
Disallow: /lang/group1
Disallow: /lang/group2


...and everything else would be allowed. This would work with all robots that obey the original "standard". Or, you could block all groups (group1, group2, etc.) and make an exception for "group3", like:

User-agent: *
Disallow: /lang/group
Allow: /lang/group3


Note that the Allow directive is not part of the original "standard", but has universal support. The URL path is simply a prefix.

HOWEVER, I wouldn't use robots.txt to block the pages being "crawled". What about stray visitors? And bad bots? And robots.txt doesn't prevent pages from being indexed if they are inadvertently linked to. I would use .htaccess or your server config to actually block all traffic to these URLs. Something like the following in .htaccess:

RewriteEngine On
RewriteRule ^lang/group[12] - [R=404]


To respond with a 404 for all requests to these invalid URLs. Or, use the F flag to respond with a 403 "Forbidden".

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme