Mobile app version of vmapp.org
Login or Join
Angela700

: How can I archive (for historical record) my website? If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some

@Angela700

Posted in: #Backups #Database #WebCrawlers

If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some sort of record to prove that this data had been published (on that date), that I can use in the future, what are the options?

I have seen a commercial web archiving service that is charging per page (!). I understand that you can run software yourself on the server but i am unsure how this works with dynamically generated pages. Is backing up the database (with some type of timestamping and securing) enough? Would you then have to prove that your app was up and running and the website available as well? What are the alternatives?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Angela700

2 Comments

Sorted by latest first Latest Oldest Best

 

@Candy875

Backing up the database isn't enough, as you can't prove the contents of the database were displayed on the pages.

If the dynamically generated pages look the same for each user, and don't depend on picking options for lists, say, which then generate pages depending on those options, then you can use spidering software. It will take a snapshot of the pages as they appeared when it looked.

I use wget for this sort of thing. It's a command line tool, with a scary number of options. However the advantage of a command line tool is that you can run it automatically as much as you want. To get you going, here's how I use it to get a snapshot of a site:

"c:program fileswgetwget" -k -p -r -X video -w 1 example.com

video is a directory I don't want to get a snapshot of.
-w 1 means wait a second between grabbing each page, so I don't hammer the site.
-k means convert the links in the downloaded files so that they work when you open up those files again and don't go back to the original website
-p downloads all files used on a page, e.g. images
-r means recursive, so it follows all links which are on the site

10% popularity Vote Up Vote Down


 

@Frith620

What I would do, in case I wanted to archive everything I publish on my website is to actually write the content into a file that looks like the dynamically created site. Then I would make weekly backups of all the files a created.

The easiest way is to make backups of the DB, or even have a second DB where an exact copy will be sent whenever you publish anything.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme