: Is there a reason msnbot crawls in spikes? I've been experiencing high RPM spikes recently. Something like this: When I debugged, I've found reasons to believe the reason is the msnbot suddenly
I've been experiencing high RPM spikes recently. Something like this:
When I debugged, I've found reasons to believe the reason is the msnbot suddenly makes a massive crawl and then stops. I assume I'm not the only site that has a problem to suddenly handle 5x the normal RPM, so why does msnbot do this? Is there any valid explanation or technical reason to perform such a HIT & RUN?
More posts by @Jessie594
1 Comments
Sorted by latest first Latest Oldest Best
The msnbot was retired from active web crawling in 2010 and replaced with bingbot - is that what you meant?
Regardless, as covered here, factors that can affect its crawl rate are:
The total number of pages on a site (is the site small, large, or
somewhere in-between?)
The size of the content (PDFs and Microsoft Office files are typically much larger than regular HTML files)
The freshness of the content (how often is content added/removed/changed?)
The number of allowed concurrent connections (a function of the web server infrastructure)
The bandwidth of the site (a function of the host’s service provider; the lower the
bandwidth, the lower the server’s capacity to serve page requests)
How highly does the site rank (content judged as not relevant won’t be crawled as often as highly relevant content)
Taking the above into account might help explain the spikes in your requests per minute.
To slow down the crawl rate, specify in your site's robots.txt:
User-agent: msnbot
Crawl-delay: 1
Change msnbot to bingbot if you determine that's the bot/user-agent causing the spike. And use a crawl-delay of 5 (very slow) or 10 (extremely slow) if your server's performance is suffering.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.