Mobile app version of vmapp.org
Login or Join
Caterina187

: Robots.txt The correct way to block all search engines is using a robots.txt file, a simple text file placed in the root of your webserver containing the following code:- User-agent: *

@Caterina187

Robots.txt


The correct way to block all search engines is using a robots.txt file, a simple text file placed in the root of your webserver containing the following code:-

User-agent: *
Disallow: /


This works for all files on your webserver rather than a spesific page like the metatag mentioned in @SivaCharan 's answer.


mod_rewrite and htaccess


Since not all spiders abide by the robots.txt protocol, you might also want to block a spesific crawler from your site if it is showing up in your server logs after being blocked in robots.txt.

You can achieve this using the .htaccess file and mod_rewrite Here's an example for blocking Baidu and Sogou spiders:

Add the following to the .htaccess file:-

RewriteEngine on
Options +FollowSymlinks
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR”
RewriteCond %{HTTP_USER_AGENT} ^Sogou
RewriteRule ^.*$ - [F”


This only applies to a single domain rather than the entire webserver.


httpd.conf


If you have access to the httpd.conf you can add block a spesfic crawler from the whole server by listing the User Agent header fields there.

Include your new directives in the following section of Apache's httpd.conf file:

# This should be changed to whatever you set DocumentRoot to.
#

...
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bots
SetEnvIfNoCase User-Agent "^Sogou" bad_bots
SetEnvIf Remote_Addr "212.100.254.105" bad_bot

Order allow,deny
Allow from all

Deny from env=bad_bots

10% popularity Vote Up Vote Down


Login to follow query

More posts by @Caterina187

0 Comments

Sorted by latest first Latest Oldest Best

Back to top | Use Dark Theme