: Baiduspider is crawling my site even when forbidden by robots.txt, how do I prevent it? My site has heavy traffic because some bot. I checked access_log, some bot Baiduspider go to my site
My site has heavy traffic because some bot. I checked access_log, some bot Baiduspider go to my site 10-20 times per minute. I do not need Chinese traffic. I have searched and read www.baidu.com/search/robots_english.html
I added rule into the robots.txt then restarted Aache, but it doesn't work. Baiduspider still crawls my site.
User-agent: Baiduspider
Disallow: /
User-agent: *
Disallow: /feed/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
I found their feedback page zhanzhang.baidu.com/feedback/index I can translate the page to my language, but I cannot translate and insert captcha.
Then I have searched and find some article: www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html But when I add it into .htaccess, I cannot access my site,(you do not have permission to access this site) Am i inserted in a wrong position? need a help.
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC,OR]
RewriteRule ^.* - [F,L]
#some custom rewrite rule
RewriteRule ^article/([^/.]+)/?$ /article/.php [L,QSA]
RewriteRule ^(.*)$ www.example.com/ [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
BTW, my server is CentOS7 apache 2.4.6. I also tried "httpd.conf", but I never find any article about apache 2.4.6 <IfModule setenvif_module>, all the articles are <IfModule mod_setenvif_c>... apache 2.4.6 do cancel order allow,deny rule, I have no idea how to modify and add into my httpd.conf.
Anyway, I just want to refuse Baiduspider Thanks.
More posts by @XinRu657
2 Comments
Sorted by latest first Latest Oldest Best
I think the problem with your rewrite rule is the OR flag. That flag usually means that there is a second rewrite condition coming. You only have one condition.
Here is a site that provides a similar rule for blocking BaiduSpider with slightly different syntax:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteRule .* - [F]
You can try blocking specific IP addresses in your .htaccess file. You can find the ranges here.
In robots.txt you can also add the following
User-agent: Baiduspider
User-agent: baiduspider
User-agent: Baiduspider+
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /
Also, if you use caching plugins or CDN, make sure to clear all your cache.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.