Mobile app version of vmapp.org
Login or Join
Welton855

: The first step would be to figure out if this bot reads robots.txt. You can define a crawl-delay in there for this bot. This question goes in more detail about what you have to do. You

@Welton855

The first step would be to figure out if this bot reads robots.txt. You can define a crawl-delay in there for this bot. This question goes in more detail about what you have to do. You can disallow this bot to crawl your site via there too. Please note that rogue bots do not read, or simply ignore, robots.txt and that changes to this file might not work.

If you want to block a rogue bot the hard way, figure out it's ip. You can use mod_authz_host in .htaccess to hard-block this ip. You can do this with the following code:

<Directory />
Order Deny,Allow
Deny from 127.0.0.1
Allow from all
</Directory>


The xml-like-tags around this code say that these rules are valid for the / directory (root). If a request is processed, it will process all rules for that directory and all rules for any parent directory that directory has. (so: A request to /asdf/ will eventually parse these rules too.

Order Deny,Allow tells Apache to process all 'deny' rules first, then all allow rules. Deny from 127.0.0.1, will block all requests to 127.0.0.1. Needless to tell that you should change this ip to the real ip. Allow from all will allow everyone else access. This will present that bot with a Forbidden error I believe.

More information about mod_authz_host can be found here and more information about the <directory> directive can be found here.

10% popularity Vote Up Vote Down


Login to follow query

More posts by @Welton855

0 Comments

Sorted by latest first Latest Oldest Best

Back to top | Use Dark Theme