Mobile app version of vmapp.org
Login or Join
Sent6035632

: MS Bing web crawler out of control causing our site to go down Here is a weird one that I am not sure what to do. Today our companies e-commerce site went down. I tailed the production

@Sent6035632

Posted in: #Bing

Here is a weird one that I am not sure what to do. Today our companies e-commerce site went down. I tailed the production log and saw that we were receiving a ton of request from this range of IP's 157.55.98.0/157.55.100.0. I googled around and come to find out that it is a MSN Web Crawler.

So essentially MS web crawler overloaded our site causing it not to respond. Even though in our robots.txt file we have the following;

Crawl-delay: 10


So what I did was just banned the IP range in iptables.

But what I am not sure to do from here is how to follow up. I can't find anywhere to contact Bing about this issue, I don't want to keep those IPs blocked because I am sure eventually we will get de-indexed from Bing. And it doesn't really seem like this has happened to anyone else before.

Any Suggestions?

Update, My Server / Web Stats

Our web server is using Nginx, Rails 3, and 5 Unicorn workers. We have 4gb of memory and 2 virtual cores. We have been running this setup for over 9 months now and never had an issue, 95% of the time our system is under very little load. On average we receive 800,000 page views a month and this never comes close to bringing / slowing down our web server.

Taking a look at the logs we were receiving anywhere from 5 up to 40 request / second from this IP range.

In all my years of web development I have never seen a crawler hit a website so many times.

Is this new with Bing?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Sent6035632

3 Comments

Sorted by latest first Latest Oldest Best

 

@Welton855

There are two ways of controlling the Bingbot; see www.bing.com/webmaster/help/crawl-control-55a30302 for details.

If you don't want to use their control panel just use a robots.txt file.

"If we find a crawl-delay: directive in your robots.txt file then it will take always precedence over the information from this feature."

10% popularity Vote Up Vote Down


 

@Kaufman445

Use PHP plus Regex. Forget the Robots.txt. Several bad bots don't respect it...

if (preg_match('/(?i)bingbot/',$_SERVER['HTTP_USER_AGENT']))
{
exit();
}


And you tell for Bing: The door is closed for you!

10% popularity Vote Up Vote Down


 

@Si4351233

Sign up with Bing webmaster tools and fill out their crawl speed chart. Set it for fastest crawling during your off hours and a much reduced rate during your busiest times.

If Bing is knocking over your website, you need to rethink your web server capacity. The best test is to see if you can survive Google, Bing, Yahoo and Baidu all hitting your system at once. If it remains in service during the onslaught, then you're ready for a live customer load.

Yes, Bing can hit you pretty hard if you haven't given them a limit. It was causing me serious issues here two months ago. I just tuned the system up to handle it and it was a good thing, otherwise Black Friday would have resulted in a very Blue Monday after viewing the server stats.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme