Mobile app version of vmapp.org
Login or Join
Murphy175

: Googlebot-Mobile ignoring robots.txt, pretending to be Googlebot I'm very disappointed that Googlebot is apparently ignoring my robots.txt. I have the following sole entry for my /robots.txt:

@Murphy175

Posted in: #Googlebot #GooglebotMobile #GoogleSearchConsole #RobotsTxt #UserAgent

I'm very disappointed that Googlebot is apparently ignoring my robots.txt.

I have the following sole entry for my /robots.txt:

location = /robots.txt {
return 200
"## $host ##
n #Dear Google, we do not appreciate fake User-Agent strings #that span 3 lines, and quadruplicate requests per page.
User-agent: Googlebot-MobilenDisallow: /
n
";
}


It shows up as follows:

## constantine.su ##

#Dear Google, we do not appreciate fake User-Agent strings
#that span 3 lines, and quadruplicate requests per page.
User-agent: Googlebot-Mobile
Disallow: /


Yet now, years after having had to adopt the robots.txt as above, which always worked, I'm now repeatedly getting the following misleading User-Agent entries in my access_log:

66.249.66.163 - - [20/May/2016:07:19:50 -0700] "GET / HTTP/1.1" 200 3314 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

What's going on? Where do I report these misbehaving bots and Google's clear abuse of the crawling privilege?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Murphy175

2 Comments

Sorted by latest first Latest Oldest Best

 

@Murray432

Apparently, everyone was sick and tired with receiving useless Googlebot-Mobile traffic, and must have had it blocked, such that Google has decided to ignore the relevant robots.txt entries, giving us less choice of what to allow and not allow.

support.google.com/webmasters/answer/1061943 http://webmasters.googleblog.com/2014/01/a-new-googlebot-user-agent-for-crawling.html webmasters.googleblog.com/2016/03/updating-smartphone-user-agent-of.html http://productforums.google.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/MTVZ55aUYNU


If using nginx, the following is a solution to prevent Googlebot-Mobile from wasting site's bandwidth and from ignoring robots.txt rules:

add_header Vary-Not User-Agent; #add_header Vary-Not User-Agent always; #1 .7.5
default_type text/html;

if ($http_user_agent ~ ".*Mobile.*Safari.*Googlebot/.*") {
return 400
"<!DOCTYPE html><title>Googlebot Bad Request</title>
<h1>Dear Google, we do not appreciate fake User-Agent strings that
span 3 lines, ignore robots.txt and duplicate requests per page.</h1>
";
}

location = /robots.txt {
return 200
"#Dear Google, we do not appreciate fake User-Agent strings #that span 3 lines, and quadruplicate requests per page.
User-agent: Googlebot-MobilenDisallow: /n
";
}

10% popularity Vote Up Vote Down


 

@Speyer207

Robots.txt

Google on occasions has been known to ignore robots.txt, it should never be treated as a guarantee. Sensitive pages should always some of form of authentication, or you could opt to block those user agents by returning a 403 forbidden status.

However, this is not the issue.

Google Probing

Googlebot is known to probe sites with different user agents, it does this to ensure there's no abuse and to understand what your site supports, and what it doesn't support.

If you see Android, iOS or anything in the user agent it doesn't mean that the Google's mobile-bot is crawling your site... in fact if you take a closer look at your access log entry your see that it does not mention the Mobile Bot, which it will when indexing your site in Google Mobile Results... Google lists there user agent strings on there website.

Google Crawlers:


Normal Bot: (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mobile Bot:(compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)


Pads are not Mobiles

It should also be noted that Google does not treat Windows, iOS and Android pads as mobile devices. It's likely that your site is being probed by Google, but more likely that Google treats it's own Google Nexus X5 as both a Pad and a Mobile, due to the high resolution offered by this device (1920x1080) in landscape mode, comparable to a desktop


SOURCE

Tablets: We consider tablets as devices in their own class, so when we
speak of mobile devices, we generally do not include tablets in the
definition. Tablets tend to have larger screens, which means that,
unless you offer tablet-optimized content, you can assume that users
expect to see your site as it would look on a desktop browser rather
than on a smartphone browser.


However....

It was announced by Google that they would be changing there smartphone user agent awhile back:


SOURCE

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile
Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html) (Googlebot smartphone user-agent starting from April 18, 2016)


So, maybe the website that I linked about user agents is outdated, but going back to what I said about probing, Google does this... to better understand pages that use newer web technologies. Also, Google may treat some mobiles as Pads depending on viewpoint resolution... unable to confirm this but would make sense... since they are capable of displaying the same content as a desktop PC.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme