: Is there a way to make Alexa's ia_archiver slow down its crawling of my website? Alexa's ia_archiver bot is the main contributor to the Internet Archive's "Wayback Machine" web collection, and
Alexa's ia_archiver bot is the main contributor to the Internet Archive's "Wayback Machine" web collection, and there are advantages to having my website included in that collection. There are other robots that do other useful things too.
What's a quick and easy way to make ia_archiver crawl my website more slowly, in order to put less load on the server? I haven't tested the Crawl-delay directive: if you have, and it works, please tell me so. If it doesn't work, please leave a comment. If you've never tested Crawl-delay, then instead please recommend some other solution that takes fifteen minutes or less to implement. Maybe there exists some easy-to-implement software-based solution which will let me throttle too-rapid hits from ia_archiver?
Please assume my website is running on Apache 2.4.3 on Debian Linux 6.0.6 on a dedicated server which I administer.
More posts by @Angela700
1 Comments
Sorted by latest first Latest Oldest Best
Crawling for the Internet Archive is done both by Alex and by the Internet Archive's own crawlers. Support for the Crawl-Delay directive in the robots.txt file is vairly hap-hazard between the two due to the directive not being part of the official robots.txt standard. In addition the way both companies treat the Crawl-Delay directive when they do accept it seems to change over time in my experience. I have tried doing this in the past and have found that sometimes the Crawl-Delay directive has been respected by both, other times only one has respected it, and yet some other times neither has, and there doesn't seem to be a pattern to when it is respected or not. The only thing I can suggest that will definitely work is to add a disallow directive for both the Alexa crawler and the ia_crawler.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.