Mobile app version of vmapp.org
Login or Join
Bryan171

: Best to archive a website for scholarly purposes? A non-profit is going through a major website renovation. It's a large website, with about 3,000 pages. These was done using HTML, CSS, JPGs

@Bryan171

Posted in: #Linux

A non-profit is going through a major website renovation. It's a large website, with about 3,000 pages. These was done using HTML, CSS, JPGs and lots of Adobe PDFs. This was designed long before modern CMS systems became popular. Their new website is going to be small because they don't want to convert all the old content, but would prefer to archive it for future use by scholars.

They don't want to continue to host the old web pages, so the question came up how to archive the content of the old 3,000 page website, but so it would still be accessible for scholars.

Wget method:
One suggestion is to create a DVR-ROM of the content. With this method, I was thinking of using Linux 'wget' command to suck down all the pages, and end up with it still being accessible like the old hosted website, but it would run entirely off the DVD-ROM. A DVR-ROM was suggested so they could charge a small fee for the DVD-ROM and wouldn't have to host it. It's also possible that instead of the DVR-ROM, the entire contents could be a .zip file and charge a small fee to download it.
If this sounds workable, what's the best wget options to use to suck down an entire website so it can be run as a sort of Kiosk that's self contained, where the user would only need a web browser for this to work.

I fully realize in years to come, modern computers might not support web browsers as they are used today, but that would be the job for the next archive project to convert this content into some other format or database that makes sense. So please refrain from a debate about archive formats for the next 100 years. :-)

What would be the best method of using wget to achieve the objective? Or is there another method that's better to archive the old website so it's still accessible? Thanks!

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Bryan171

1 Comments

Sorted by latest first Latest Oldest Best

 

@Karen161

You can use wget mirror mode, this way:

$ wget -mk www.website.com/
you'll get a local mirror of the site in your disk.

For archiving format, you can skip it, and it will allow the users to browse directly from the disk, instead of unpacking before, or you can use a war file, as some browsers support browsing from war files.
If you are backuping the result I would advise to compress it, because copying will take a lot of time due the huge number of files.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme