: The first step would be to figure out if this bot reads robots.txt. You can define a crawl-delay in there for this bot. This question goes in more detail about what you have to do. You
The first step would be to figure out if this bot reads robots.txt. You can define a crawl-delay in there for this bot. This question goes in more detail about what you have to do. You can disallow this bot to crawl your site via there too. Please note that rogue bots do not read, or simply ignore, robots.txt and that changes to this file might not work.
If you want to block a rogue bot the hard way, figure out it's ip. You can use mod_authz_host in .htaccess to hard-block this ip. You can do this with the following code:
<Directory />
Order Deny,Allow
Deny from 127.0.0.1
Allow from all
</Directory>
The xml-like-tags around this code say that these rules are valid for the / directory (root). If a request is processed, it will process all rules for that directory and all rules for any parent directory that directory has. (so: A request to /asdf/ will eventually parse these rules too.
Order Deny,Allow tells Apache to process all 'deny' rules first, then all allow rules. Deny from 127.0.0.1, will block all requests to 127.0.0.1. Needless to tell that you should change this ip to the real ip. Allow from all will allow everyone else access. This will present that bot with a Forbidden error I believe.
More information about mod_authz_host can be found here and more information about the <directory> directive can be found here.
More posts by @Welton855
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.