: What is a tolerable request rate for bots? I'm writing an indexing crawler for my hobby search engine. What would be a safe figure for requests per second so I wouldn't be mistaken for a
I'm writing an indexing crawler for my hobby search engine. What would be a safe figure for requests per second so I wouldn't be mistaken for a DOS attack and I wouldn't get blocked by firewalls and such?
More posts by @Sent6035632
2 Comments
Sorted by latest first Latest Oldest Best
I've written a search engine bot before. Building it was fun! There is a lot of cleanup that you'll have to do on the URLs, as munged URLs will constantly crash your search engine bot when it runs.
I'd set it to a 5 second sleep timer. Websites probably won't care as long as you aren't trying to hit them with 10K requests in <1 second. It's pretty easy to tell that if a user isn't looking at a page for 5+ seconds / domain, that they are a bot. Users can't read 2+ pages simultaneously. However in those 5 seconds, you can be scanning 10-15 other websites & indexing their pages. So it's not like your bot will be blocked for 5 seconds while it sleeps. It only sleeps on each site for 5 seconds.
Make sure to name/identify your bot too in the code, so that other webmasters will know that it's a search engine bot & will want to white-list it. You should be able to do that by setting a header.
Google will not make a request less than 2 seconds apart unless you specifically set a crawl rate. That is a good guideline.
My question is, why write a bot when there are so many open source bots fully developed and fully vetted? It would be a lot safer to use one already thrashed out than make all the same mistakes and be banned and possibly face retaliation.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.