: Prevent crawler that doesn't honoring robots.txt I have some problem, when I try to write robots.txt for my site ... I find some issues by search on Google, and tell me about honor and not
I have some problem, when I try to write robots.txt for my site ...
I find some issues by search on Google, and tell me about honor and not honoring robots.txt, how I can prevent it, can I perform it with .htaccess or other way ?
More posts by @YK1175434
2 Comments
Sorted by latest first Latest Oldest Best
Simple: Ban them all! With PHP and Regex. For example:
if (preg_match('/(?i)badbot1|badbot2|badbot3/',$_SERVER['HTTP_USER_AGENT'])){
header ('HTTP/1.1 403 Forbidden');
exit();
}
The header statement is optional
Be careful, never close the last "badbot" with a pipe "|". If you do, you ban all your traffic!
So, use "badbot1|badbot2|badbot3".
Never "|badbot1|badbot2|badbot3" and
Never "badbot1|badbot2|badbot3|"
Good luck
If there are crawlers not following your robots.txt rules you will need to ban them by IP. Placing their user agent's into your robots.txt to ban does nothing if they aren't following it's rules.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.