Mobile app version of vmapp.org
Login or Join
Caterina187

: Why deny access to website for msnbot/bingbot? I've seen quite a lot of tutorials that recommend you to ban user agents containing the strings libwww-perl and msnbot. I understand why one would

@Caterina187

Posted in: #Bing #Seo #WebCrawlers

I've seen quite a lot of tutorials that recommend you to ban user agents containing the strings libwww-perl and msnbot. I understand why one would ban libwww-perl, it's mainly if not only used for hacking and spamming.

But why are there so many sites recommending to ban msnbot/bingbot?
Since it's a search engine, even if only with a marginal market share, I would except one would want this bot to crawl one's sites.

What is it that msnbot does that makes people ban it?

10.06% popularity Vote Up Vote Down


Login to follow query

More posts by @Caterina187

6 Comments

Sorted by latest first Latest Oldest Best

 

@Radia820

Recent changes in the relationship between Bing -> Edge makes the question interesting. Should we accept the bingbot behaviour?

The last couple of weeks we have seen -in northern Europe anyhow -Bing starting to index content based on URLs opened with Edge, making massive amount of "secret" data available that never was meant for the public because Edge is now feeding bing with all those "secret" URLs only you visit. So your email with the obfuscated link to show you a private receipt after the hotel stay suddenly are indexed and published by Bing just because you opened the link and viewed the receipt through the Edge browser. A search with the "site:" parameters are now starting to reveal peoples private stuff from hotels, to art purchases and even shoving bank and credit card invoices because a lot of web services serve this stuff through long, secret URLs that normally would be impossible to guess and get access to. But Edge gives it all away to bing, for free. And you probably signed it off in the user agreements anyhow.

Of course these kind of data should never be accessible without proper authentication, but in real life secret links like this are wildly used.

I use obfuscated links in one of my websites for one specific purpose, but its not revealing any private or sensitive data thus it is harmless. Still I don't think all these links should be indexed by Bing just because the users are visiting them through Edge, they should be given to those they are meant for and nobody else. So i temporarily blocked Bing until a solution was in place.

I find little info of this new and dodgy Bing - Edge behaviour on the internet so far, other than papers writing about the small scandals its beginning to create in our country a few weeks ago.

10% popularity Vote Up Vote Down


 

@Pierce454

Whilst BingBot has a Webmaster Tools section that allows you to limit the speed at which the bot crawls your site, there are three major problems with their approach.


They don't allow you to select a crawl rate by number of seconds like Google do. Instead they have a crappy low to high range, but make no attempt to explain what low and high actually mean in terms of seconds between hits.
BingBot may adhere to your wishes to crawl at a slower rate, but they often have multiple spiders crawling your site at the same time. Many spiders crawling at a low rate can be far worse than one spider crawling at a high rate.
Microsoft don't care. I have contacted them about instances where they had around 20 individual bot connections to our server loading pages every few seconds and bringing the server to a halt. Their response was that there was nothing they could do about it.


A simple bit of programming skill by Microsoft could easily ensure that only one bot crawls a site at any time.

My solution is to limit the MSN IP ranges in iptables. I'm still experimenting with this, but I believe this can still allow them access to the sites, but force them to slow down. When the connections become too aggressive they are rejected.

10% popularity Vote Up Vote Down


 

@BetL925

MSNBot is extremely aggressive and has sucked up over 2.5GB of bandwidth from many of my sites in less than a month (that's 2.5GB+ for each site). Microsoft really needs to straighten that out but probably never will. Until then, I'm treating MSNBot as the malicious program it is and banning it from my systems.

10% popularity Vote Up Vote Down


 

@Annie201

One of my clients was doing ,000 monthly from Bing shopping alone. Organics from Bing was even more. Banning them would cause a big loss of revenue. Anyone suggesting it must have their own personal reasons. Bing generates visits so if you want to decrease your traffic go a head and ban Bing. Otherwise like Anthony said you can work with their Webmaster Tools to better your site for Bing.com

10% popularity Vote Up Vote Down


 

@Sue5673885

I don't think people should ban bing bot.

Bing has an equivalent Bing Webmaster tools at www.bing.com/toolbox/webmaster/ where they also have 'Crawl Settings' where you can adjust the crawl rate as seen in this video: www.bing.com/videos/watch/video/bing-webmaster-tools-crawl-rate-settings/1ii1ej9jz
Googlebot is just as notorious in excessive crawling of sites as msnbot. Also the better (traffic/linkage) your site gets, the more googlebot crawls. Just look how fast stackexchange questions get indexed after being posted. You can see how much these bots hit up your server if you check your access logs.

I also discovered that msnbot supports robots.txt Crawl-delay parameter. www.bing.com/community/site_blogs/b/webmaster/archive/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot.aspx

10% popularity Vote Up Vote Down


 

@Gretchen104

msnbot is quite prolific when it comes to spidering servers and if you have a lot of pages to index it can quite easily cripple your server. As traffic from MSN is considerably less than what Google can give it's quite common just to deny the msnbot via .htaccess, iptables or robots.txt. With Googlebot you can limit the speed quite easily in google.com/webmasters

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme