Mobile app version of vmapp.org
Login or Join
Sent6035632

: What is a tolerable request rate for bots? I'm writing an indexing crawler for my hobby search engine. What would be a safe figure for requests per second so I wouldn't be mistaken for a

@Sent6035632

Posted in: #WebCrawlers

I'm writing an indexing crawler for my hobby search engine. What would be a safe figure for requests per second so I wouldn't be mistaken for a DOS attack and I wouldn't get blocked by firewalls and such?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Sent6035632

2 Comments

Sorted by latest first Latest Oldest Best

 

@Michele947

I've written a search engine bot before. Building it was fun! There is a lot of cleanup that you'll have to do on the URLs, as munged URLs will constantly crash your search engine bot when it runs.

I'd set it to a 5 second sleep timer. Websites probably won't care as long as you aren't trying to hit them with 10K requests in <1 second. It's pretty easy to tell that if a user isn't looking at a page for 5+ seconds / domain, that they are a bot. Users can't read 2+ pages simultaneously. However in those 5 seconds, you can be scanning 10-15 other websites & indexing their pages. So it's not like your bot will be blocked for 5 seconds while it sleeps. It only sleeps on each site for 5 seconds.

Make sure to name/identify your bot too in the code, so that other webmasters will know that it's a search engine bot & will want to white-list it. You should be able to do that by setting a header.

10% popularity Vote Up Vote Down


 

@Ann8826881

Google will not make a request less than 2 seconds apart unless you specifically set a crawl rate. That is a good guideline.

My question is, why write a bot when there are so many open source bots fully developed and fully vetted? It would be a lot safer to use one already thrashed out than make all the same mistakes and be banned and possibly face retaliation.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme