: Robots.txt The correct way to block all search engines is using a robots.txt file, a simple text file placed in the root of your webserver containing the following code:- User-agent: *
Robots.txt
The correct way to block all search engines is using a robots.txt file, a simple text file placed in the root of your webserver containing the following code:-
User-agent: *
Disallow: /
This works for all files on your webserver rather than a spesific page like the metatag mentioned in @SivaCharan 's answer.
mod_rewrite and htaccess
Since not all spiders abide by the robots.txt protocol, you might also want to block a spesific crawler from your site if it is showing up in your server logs after being blocked in robots.txt.
You can achieve this using the .htaccess file and mod_rewrite Here's an example for blocking Baidu and Sogou spiders:
Add the following to the .htaccess file:-
RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR”
RewriteCond %{HTTP_USER_AGENT} ^Sogou
RewriteRule ^.*$ - [F”
This only applies to a single domain rather than the entire webserver.
httpd.conf
If you have access to the httpd.conf you can add block a spesfic crawler from the whole server by listing the User Agent header fields there.
Include your new directives in the following section of Apache's httpd.conf file:
# This should be changed to whatever you set DocumentRoot to.
#
...
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bots
SetEnvIfNoCase User-Agent "^Sogou" bad_bots
SetEnvIf Remote_Addr "212.100.254.105" bad_bot
Order allow,deny
Allow from all
Deny from env=bad_bots
More posts by @Caterina187
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.