Mobile app version of vmapp.org
Login or Join
Pope3001725

: What determines the frequency the Wayback Machine crawls one's website? I understand archive.org takes its website list to crawl from Alexa, but I don't understand how it decides the snapshot

@Pope3001725

Posted in: #InternetArchive #WebCrawlers

I understand archive.org takes its website list to crawl from Alexa, but I don't understand how it decides the snapshot frequencies for each website. We can see some websites are crawled multiple times per day while others are crawled less than once a month. How is the frequency a website will be archive determined by the Wayback Machine?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Pope3001725

2 Comments

Sorted by latest first Latest Oldest Best

 

@Shelton105

The Wayback Machine archive is a combination of data from a large number of different crawls:


Alexa crawls, which appear after a 6 month delay
Our own crawls, which are seeded from the Alexa top million list and others
ArchiveTeam crawls, done by volunteers
ArchiveIt crawls, done by our 400+ partners, mostly libraries, many of which allow their data to be included in the general Wayback Machine


We have an experimental Wayback Machine search and explore interface at web-beta.archive.org/ which makes visible why each capture was made.

10% popularity Vote Up Vote Down


 

@Annie201

There's some information about this on Wikipedia,


Snapshots usually become available more than 6 months after they are archived or in some cases even later, 24 months or longer. The frequency of snapshots is variable, so not all tracked web site updates are recorded. There are sometimes intervals of several weeks or years between snapshots.
en.wikipedia.org/wiki/Wayback_Machine

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme