: Googlebot-Mobile ignoring robots.txt, pretending to be Googlebot I'm very disappointed that Googlebot is apparently ignoring my robots.txt. I have the following sole entry for my /robots.txt:
I'm very disappointed that Googlebot is apparently ignoring my robots.txt.
I have the following sole entry for my /robots.txt:
location = /robots.txt {
return 200
"## $host ##
n #Dear Google, we do not appreciate fake User-Agent strings #that span 3 lines, and quadruplicate requests per page.
User-agent: Googlebot-MobilenDisallow: /
n
";
}
It shows up as follows:
## constantine.su ##
#Dear Google, we do not appreciate fake User-Agent strings
#that span 3 lines, and quadruplicate requests per page.
User-agent: Googlebot-Mobile
Disallow: /
Yet now, years after having had to adopt the robots.txt as above, which always worked, I'm now repeatedly getting the following misleading User-Agent entries in my access_log:
66.249.66.163 - - [20/May/2016:07:19:50 -0700] "GET / HTTP/1.1" 200 3314 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
What's going on? Where do I report these misbehaving bots and Google's clear abuse of the crawling privilege?
More posts by @Murphy175
2 Comments
Sorted by latest first Latest Oldest Best
Apparently, everyone was sick and tired with receiving useless Googlebot-Mobile traffic, and must have had it blocked, such that Google has decided to ignore the relevant robots.txt entries, giving us less choice of what to allow and not allow.
support.google.com/webmasters/answer/1061943 http://webmasters.googleblog.com/2014/01/a-new-googlebot-user-agent-for-crawling.html webmasters.googleblog.com/2016/03/updating-smartphone-user-agent-of.html http://productforums.google.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/MTVZ55aUYNU
If using nginx, the following is a solution to prevent Googlebot-Mobile from wasting site's bandwidth and from ignoring robots.txt rules:
add_header Vary-Not User-Agent; #add_header Vary-Not User-Agent always; #1 .7.5
default_type text/html;
if ($http_user_agent ~ ".*Mobile.*Safari.*Googlebot/.*") {
return 400
"<!DOCTYPE html><title>Googlebot Bad Request</title>
<h1>Dear Google, we do not appreciate fake User-Agent strings that
span 3 lines, ignore robots.txt and duplicate requests per page.</h1>
";
}
location = /robots.txt {
return 200
"#Dear Google, we do not appreciate fake User-Agent strings #that span 3 lines, and quadruplicate requests per page.
User-agent: Googlebot-MobilenDisallow: /n
";
}
Robots.txt
Google on occasions has been known to ignore robots.txt, it should never be treated as a guarantee. Sensitive pages should always some of form of authentication, or you could opt to block those user agents by returning a 403 forbidden status.
However, this is not the issue.
Google Probing
Googlebot is known to probe sites with different user agents, it does this to ensure there's no abuse and to understand what your site supports, and what it doesn't support.
If you see Android, iOS or anything in the user agent it doesn't mean that the Google's mobile-bot is crawling your site... in fact if you take a closer look at your access log entry your see that it does not mention the Mobile Bot, which it will when indexing your site in Google Mobile Results... Google lists there user agent strings on there website.
Google Crawlers:
Normal Bot: (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mobile Bot:(compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
Pads are not Mobiles
It should also be noted that Google does not treat Windows, iOS and Android pads as mobile devices. It's likely that your site is being probed by Google, but more likely that Google treats it's own Google Nexus X5 as both a Pad and a Mobile, due to the high resolution offered by this device (1920x1080) in landscape mode, comparable to a desktop
SOURCE
Tablets: We consider tablets as devices in their own class, so when we
speak of mobile devices, we generally do not include tablets in the
definition. Tablets tend to have larger screens, which means that,
unless you offer tablet-optimized content, you can assume that users
expect to see your site as it would look on a desktop browser rather
than on a smartphone browser.
However....
It was announced by Google that they would be changing there smartphone user agent awhile back:
SOURCE
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile
Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html) (Googlebot smartphone user-agent starting from April 18, 2016)
So, maybe the website that I linked about user agents is outdated, but going back to what I said about probing, Google does this... to better understand pages that use newer web technologies. Also, Google may treat some mobiles as Pads depending on viewpoint resolution... unable to confirm this but would make sense... since they are capable of displaying the same content as a desktop PC.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.