: What determines the frequency the Wayback Machine crawls one's website? I understand archive.org takes its website list to crawl from Alexa, but I don't understand how it decides the snapshot
I understand archive.org takes its website list to crawl from Alexa, but I don't understand how it decides the snapshot frequencies for each website. We can see some websites are crawled multiple times per day while others are crawled less than once a month. How is the frequency a website will be archive determined by the Wayback Machine?
More posts by @Pope3001725
2 Comments
Sorted by latest first Latest Oldest Best
The Wayback Machine archive is a combination of data from a large number of different crawls:
Alexa crawls, which appear after a 6 month delay
Our own crawls, which are seeded from the Alexa top million list and others
ArchiveTeam crawls, done by volunteers
ArchiveIt crawls, done by our 400+ partners, mostly libraries, many of which allow their data to be included in the general Wayback Machine
We have an experimental Wayback Machine search and explore interface at web-beta.archive.org/ which makes visible why each capture was made.
There's some information about this on Wikipedia,
Snapshots usually become available more than 6 months after they are archived or in some cases even later, 24 months or longer. The frequency of snapshots is variable, so not all tracked web site updates are recorded. There are sometimes intervals of several weeks or years between snapshots.
en.wikipedia.org/wiki/Wayback_Machine
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.