: How to block a user-agent that has a spacing in its name? I got a hit from a crawler with a user-agent called DV CRAWLER which is an abvious a spam-bot. I tried to block it in both .htacess
I got a hit from a crawler with a user-agent called DV CRAWLER which is an abvious a spam-bot. I tried to block it in both .htacess and nginx configuration as I'm running nginx as a reverse proxy in front of apache.
Here is the code I used for .htaccess
RewriteCond %{HTTP_USER_AGENT} ^.*(Baiduspider|DV CRAWLER).*$ [NC]
RewriteRule .* - [F,L]
Seems that the spacing in the name of the user agent has broke the code. I discovered that it only works with user agents that has no spaces. Same scenario with nginx, it doesn't accept spacing in the name of user agent and returns error.
Nginx code:
if ($http_user_agent ~ (Baiduspider|DV CRAWLER) ) {
return 403;
}
So, what is the alternative for this? I don't want these spam bots to crawl my website. Any answer would be greatly appreciated.
More posts by @Vandalay111
3 Comments
Sorted by latest first Latest Oldest Best
The space is a delimiter (ie. a special character) in .htaccess so must be backslash escaped if you want to match a literal space in the regex. Eg. DV CRAWLER. (Otherwise you are likely to get a less than helpful 500 Internal Server error.)
Or, you can use the shorthand character class s which matches any white space character (space, tab or new line / line break) - so not technically just a space.
When in doubt, add parenthesis and escaping to regular expressions. Try this first:
(Baiduspider|(DV CRAWLER))
I think that your problem is that it evaluating as "Baiduspider or DV followed by CRAWLER" when you don't have the paranthesis. If that doesn't work, then try escaping the space:
(Baiduspider|(DVsCRAWLER))
Where s is any white space character.
Your regex code in general is wrong.
Try instead something like this:
RewriteCond %{HTTP_USER_AGENT} (.*Baiduspider.*|.*DV.*CRAWLER.*) [NC]
You are matching against a string in each iteration between the parenthesis () separated by the pipe character | whereas .* is a wild card that matches anything. Optionally you can use s or s+ for spaces but .* works too and may be better. Not knowing what the DV CRAWLER string looks like, I made a guess (SWAG). You may need to adjust this.
For example: A string of a line of red cars driving down the street could be matched simply using .*red.*cars.*. There are slicker regular expressions for this, but using this simple method can be safely repeated over and over.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.