: Strange Bingbot hits in my website access logs I'm seeing many hits to my site recently in the access logs and I'm not sure what to do with them. The pages they are trying to reach do not
I'm seeing many hits to my site recently in the access logs and I'm not sure what to do with them. The pages they are trying to reach do not exist and they say they are coming from Bingbot, but I don't think those are bing IP addresses. Any one have any ideas of how I should handle these either via htaccess or reporting it to Bing?
66.249.69.1 - - [11/Aug/2016:07:41:23 -0400] "GET /index.php/write-academic-papers-for-money/js/jquery-1.8.2.min.js HTTP/1.1" 200 10014 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com /bot.html)"
70.208.74.141 - - [11/Aug/2016:07:41:28 -0400] "GET /images/ways.jpg HTTP/1.1" 200 188202 "http://tt.tennis- warehouse.com/index.php?threads/nice-mean-pros-on-tour.570480/" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12D508 Safari/600.1.4"
40.77.167.6 - - [11/Aug/2016:07:41:30 -0400] "GET /index.php/buy-research-paper-no-plagiarism/gifts-gear.php HTTP/1.1" 200 9866 "-" "Mozilla/5.0 (compatible; bingbot/2.0;)"
More posts by @Hamm4606531
1 Comments
Sorted by latest first Latest Oldest Best
The 3 log records shown all look like legitimate traffic (both the Google and Bing IP addresses appear valid) and as closetnoc has already pointed out, only the last one references the Bingbot.
The pages they are trying to reach do not exist
But your server is returning a 200 OK status, which is potentially allowing these URLs to be indexed by the search engines. If these URLs returned a 404 Not Found then it wouldn't be such a problem.
It looks like your site has been the target of a XSS-like attack to create spammy links in the SERPs for keywords that are irrelevant to your site.
Is there something I can do to prevent any /index.php/XXXXXX requests
Yes. The additional XXXXXX in the URL after a valid filename is trailing pathname information (PATH_INFO). The default behaviour on Apache generally allows this additional path info (although it depends on the handler).
However, this can be disabled with the AcceptPathInfo directive in your server config or .htaccess file. For example:
AcceptPathInfo Off
This will result in Apache returning a 404 NOT FOUND error on such requests.
Apache docs... httpd.apache.org/docs/2.4/mod/core.html#acceptpathinfo
Depending on your website URL structure, you could just block any direct requests to index.php. Something like the following, using mod_rewrite in the root .htaccess file:
RewriteEngine On
RewriteCond %{THE_REQUEST} ^GET /index.php [NC]
RewriteRule ^index.php - [F]
This would need to go before any URL routing directives (eg. WordPress).
THE_REQUEST contains the initial request header only, so you are still OK to internally rewrite to index.php if you are using a front controller (for example).
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.