![Sent6035632](https://vmapp.org/images/player/000default.jpg)
: MS Bing web crawler out of control causing our site to go down Here is a weird one that I am not sure what to do. Today our companies e-commerce site went down. I tailed the production
Here is a weird one that I am not sure what to do. Today our companies e-commerce site went down. I tailed the production log and saw that we were receiving a ton of request from this range of IP's 157.55.98.0/157.55.100.0. I googled around and come to find out that it is a MSN Web Crawler.
So essentially MS web crawler overloaded our site causing it not to respond. Even though in our robots.txt file we have the following;
Crawl-delay: 10
So what I did was just banned the IP range in iptables.
But what I am not sure to do from here is how to follow up. I can't find anywhere to contact Bing about this issue, I don't want to keep those IPs blocked because I am sure eventually we will get de-indexed from Bing. And it doesn't really seem like this has happened to anyone else before.
Any Suggestions?
Update, My Server / Web Stats
Our web server is using Nginx, Rails 3, and 5 Unicorn workers. We have 4gb of memory and 2 virtual cores. We have been running this setup for over 9 months now and never had an issue, 95% of the time our system is under very little load. On average we receive 800,000 page views a month and this never comes close to bringing / slowing down our web server.
Taking a look at the logs we were receiving anywhere from 5 up to 40 request / second from this IP range.
In all my years of web development I have never seen a crawler hit a website so many times.
Is this new with Bing?
More posts by @Sent6035632
3 Comments
Sorted by latest first Latest Oldest Best
There are two ways of controlling the Bingbot; see www.bing.com/webmaster/help/crawl-control-55a30302 for details.
If you don't want to use their control panel just use a robots.txt file.
"If we find a crawl-delay: directive in your robots.txt file then it will take always precedence over the information from this feature."
Use PHP plus Regex. Forget the Robots.txt. Several bad bots don't respect it...
if (preg_match('/(?i)bingbot/',$_SERVER['HTTP_USER_AGENT']))
{
exit();
}
And you tell for Bing: The door is closed for you!
Sign up with Bing webmaster tools and fill out their crawl speed chart. Set it for fastest crawling during your off hours and a much reduced rate during your busiest times.
If Bing is knocking over your website, you need to rethink your web server capacity. The best test is to see if you can survive Google, Bing, Yahoo and Baidu all hitting your system at once. If it remains in service during the onslaught, then you're ready for a live customer load.
Yes, Bing can hit you pretty hard if you haven't given them a limit. It was causing me serious issues here two months ago. I just tuned the system up to handle it and it was a good thing, otherwise Black Friday would have resulted in a very Blue Monday after viewing the server stats.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.