: Robots.txt The correct way to block all search engines is using a robots.txt file, a simple text file placed in the root of your webserver containing the following code:- User-agent: *

Robots.txt

The correct way to block all search engines is using a robots.txt file, a simple text file placed in the root of your webserver containing the following code:-

User-agent: *
Disallow: /

This works for all files on your webserver rather than a spesific page like the metatag mentioned in @SivaCharan 's answer.

mod_rewrite and htaccess

Since not all spiders abide by the robots.txt protocol, you might also want to block a spesific crawler from your site if it is showing up in your server logs after being blocked in robots.txt.

You can achieve this using the .htaccess file and mod_rewrite Here's an example for blocking Baidu and Sogou spiders:

Add the following to the .htaccess file:-

RewriteEngine on
Options +FollowSymlinks
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR”
RewriteCond %{HTTP_USER_AGENT} ^Sogou
RewriteRule ^.*$ - [F”

This only applies to a single domain rather than the entire webserver.

httpd.conf

If you have access to the httpd.conf you can add block a spesfic crawler from the whole server by listing the User Agent header fields there.

Include your new directives in the following section of Apache's httpd.conf file:

# This should be changed to whatever you set DocumentRoot to.
#

...
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bots
SetEnvIfNoCase User-Agent "^Sogou" bad_bots
SetEnvIf Remote_Addr "212.100.254.105" bad_bot

Order allow,deny
Allow from all

Deny from env=bad_bots