Mobile app version of vmapp.org
Login or Join
Jennifer507

: How do I block a user-agent from Apache How do I realize a UA string block by regular expression in the config files of my Apache webserver? For example: if I would like to block out all

@Jennifer507

Posted in: #Apache2 #ApacheLogFiles #Googlebot #WebCrawlers

How do I realize a UA string block by regular expression in the config files of my Apache webserver?

For example: if I would like to block out all bots from Apache on my debian server, that have the regular expression /bw+[Bb]otb/ or /Spider/ in their user-agent.

Those bots should not be able to see any page on my server and they should not appear neither in the accesslogs nor in the errorlogs.
global-security.blogspot.de/2009/06/how-to-block-robots-before-they-hit.html supposes to uses mod_security for that, but isn't there a simple directive for http.conf?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Jennifer507

1 Comments

Sorted by latest first Latest Oldest Best

 

@Vandalay111

I enabled rewrite engine in Apache:

a2enmod rewrite


and added this block to my /etc/apache2/httpd.conf

<Directory /var/www/>
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sosospider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BaiduSpider [NC]
# Allow access to robots.txt and forbidden message
# at least 403 or else it will loop
RewriteCond %{REQUEST_URI} !^/robots.txt$
RewriteCond %{REQUEST_URI} !^/403.shtml$
RewriteRule ^.* - [F,L]
</IfModule>
</Directory>


and restarted Apache:

apache2ctl graceful


now these calls from those spiders all cause 403 Errors:

grep -E 'spider|bot' /var/log/apache2/*.log

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme