Mobile app version of vmapp.org
Login or Join
Shelley277

: How can I use .htaccess to respond with 403 forbidden status for URLs that contain a query string? Some bots have been crawling my site for every link that ends with: ?utm_source=dlvr.it&utm_medium=twitter

@Shelley277

Posted in: #403Forbidden #Apache #Htaccess #QueryString

Some bots have been crawling my site for every link that ends with:

?utm_source=dlvr.it&utm_medium=twitter


I haven't checked out its IP.

Then other bots (10+) follow the link rules ?utm_source=dlvr.it&utm_medium=twitter and also crawl my site. This results in a huge amount of traffic which then causes my site to shut down. I have added 10+ other bots into my blacklist with an HTTP 403 status code when they access my site.

But I think the best way is to find out the first bot which crawled my site for every link that ends with:

?utm_source=dlvr.it&utm_medium=twitter


Or, use a an HTTP 403 redirect status code when the URL contains:

?utm_source=dlvr.it&utm_medium=twitter


I know of some ways to add .htaccess code that would prevent someone from crawling my xmlrpc.php page, such as:

<Files xmlrpc.php>
Order Deny,Allow
Deny from all
</Files>


But what about a query in URL?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Shelley277

2 Comments

Sorted by latest first Latest Oldest Best

 

@Jamie184

?utm_source=dlvr.it&utm_medium=twitter


utm_source and utm_medium are used by Google Analytics (and possibly other trackers) to monitor campaigns, so blocking access purely on this query string does not "feel right", however, if this is correct in your situation then ok...

An important point to realise with query strings is that they cannot be matched using mod_rewrite's RewriteRule alone (or using the Request_URI variable in mod_setenvif - as suggested in comments). The query string is removed from the URL-path before it is matched against the RewriteRule pattern.

Enable the rewrite engine (mod_rewrite) if not already:

RewriteEngine On


You need to use the RewriteCond directive. So, in order to serve a "403 Forbidden" for all requests that match the above query string then you can use something like:

RewriteCond %{QUERY_STRING} =utm_source=dlvr.it&utm_medium=twitter
RewriteRule .* - [R=403,L]


This is an exact match for the specified query string (it's not a regex, so the dot does not need to be escaped). If you need it to be less restrictive and match all query strings that just starts with the above query string then use a regex:

RewriteCond %{QUERY_STRING} ^utm_source=dlvr.it&utm_medium=twitter
RewriteRule .* - [R=403,L]


You can also make the regex case-insensitive by using the NC (NOCASE) flag. However, only use this if you specifically want to ignore case in the match. People tend to append this flag out of habit, however, it's often unnecessary (or sometimes even incorrect) and just makes the regex engine work that bit harder.

This is not particularly efficient since every request will be processed. If, for instance, only the URLs within the /path/to/files path is targeted then you could make the RewriteRule pattern more restrictive:

RewriteRule ^path/to/files/ - [R=403,L]

10% popularity Vote Up Vote Down


 

@Harper822

If you have the mod_rewrite module installed, then you can put this in your .htaccess file in the root folder of your website (which usually is the public_html folder):

RewriteEngine On
RewriteRule ^?utm_source=dlvr.it&utm_medium=twitter$ - [R=403,NC,L]


You might have to remove the from the =, I can't remember if equals needs escaping.

Another way would be this if you're searching for the string anywhere in the URL:

RewriteEngine On
RewriteRule ^(.*)?utm_source=dlvr.it&utm_medium=twitter(.*)$ - [R=403,NC,L]


The NC at the end means not case-sensitive, so if the text is all upper-case, then the bots would be directed to an error 403.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme