Mobile app version of vmapp.org
Login or Join
Nickens628

: Bingbot requests from Google IP address We have some suspicious requests to our server, 74.125.186.46 - - [24/Aug/2014:23:24:11 -0500] "GET <url> HTTP/1.1" 200 16912 "-" "Mozilla/5.0 (compatible;

@Nickens628

Posted in: #Bingbot #Google #Googlebot #Spam

We have some suspicious requests to our server,

74.125.186.46 - - [24/Aug/2014:23:24:11 -0500] "GET <url> HTTP/1.1" 200 16912 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
74.125.187.193 - - [24/Aug/2014:23:24:12 -0500] "GET <url> HTTP/1.1" 200 20119 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"


As it shows, user-agent shows it is bingbot. But whois data of IP address(74.125.186.46 and 74.125.187.193) shows it is from google servers.

So is it Google,Bing or any other content scrappers?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Nickens628

3 Comments

Sorted by latest first Latest Oldest Best

 

@Odierno851

As others mentioned, you can verify real Googlebots, and this isn't a Googlebot IP address.

I double-checked with the team about these requests, and they appear to be for the PageSpeed service, which can act as a cache/proxy for websites. If search engines - like Bing or Google - crawl URLs like that, the service will forward those requests to your website when needed. That can make it look like these requests are coming from Google IP addresses, even though they initially originated elsewhere.

10% popularity Vote Up Vote Down


 

@Kevin317

You can verify all Google crawlers by using PTR records.

See: Verifying Googlebot

I find this very accurate. Google now also offers hosting services as well as many other services, so there can easily be requests from Google assigned IP ranges that are not Google's search agents.

There is also a good list of Google User Agents.

10% popularity Vote Up Vote Down


 

@Alves908

These are Google IP addresses as you stated. However, this does not mean that it is part of the search engine. Google has expanded it's business lately and not all of what is happening using a Google IP address has lived up to the standards we have all grown accustomed to. Unfortunately.

There are no reverse PTR records for these IP addresses. The associated domain name would tell me more.

I looked up both IP addresses in my database. I only found 74.125.186.46. There is nothing suspicious from this IP address and the last access I have is from 2012.

However, I found these:
www.projecthoneypot.org/ip_74.125.187.193
-and-
www.projecthoneypot.org/ip_74.125.186.46
You will see that these IP addresses have various agent names and are tagged as content spammer IP addresses. However, I do not see a bing bot agent name which probably means that this is new.

Why is this?

Without the domain name, I cannot tell you what specifically happened, however, I can tell you this.

Google Code has been used to spider and data mining. Nerdydata.com uses Google Code for example.

As well, Google is now offering web hosting. I have accesses from these hosted sites consistent with spider and data mining activity. As well, I have seen hacker activities from Google host IPs.

At one point Google decided to create a large pool of IP addresses and reverse all IP addresses to 1e100.net sub-domains. The idea was that any IP address and computer could be quickly and dynamically allocated for different purposes according to need. This added confusion because search engine IP addresses could be used for other things and impossible to block or white-list. Google stated that IP addresses should not be blocked and that checking the domain name per request should validate that the access was indeed from Google. However, you know that this is not a check that is easy to set up for a web server and certainly it is not a native feature and was not required before Google. Pity.

What remains are many huge IP address allocations listed in ARIN as Google. This confuses people and researching what division (lack of a better term) is responsible for bad behaviors is nearly impossible without a domain name.

Now Google is in the domain name registration business along with hosting. It seems to me that this is a conflict of interest at best. Certainly, these are not business ventures I would have signed off on. Diversity is one thing, but sticking with the core business model is another. It seems that Google is siding with the enemy (per se') in that they are constantly at odds with hosted sites that change registration, IP addresses, hosts, and so on in order to evade detection especially when spamdexing, data mining, and content theft is in order.

I did find these IP address range assigned to googlebot.com, google.com, and 1e100.net sub-domains in my database. This does not mean that they are currently being used by the search engine, but rather that they have in the past. It is not likely the search engine IP allocation that hit you, however, it could be allocated as such tomorrow.

I wish I could tell you more.

Block these IP addresses if you feel it is important. Otherwise, consider posting this question on the Google forums in the hopes of waking Google up to the mess they have made. Perhaps they need to rethink their policies a bit. Actually, no perhaps about it!

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme