: How can I archive (for historical record) my website? If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some
If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some sort of record to prove that this data had been published (on that date), that I can use in the future, what are the options?
I have seen a commercial web archiving service that is charging per page (!). I understand that you can run software yourself on the server but i am unsure how this works with dynamically generated pages. Is backing up the database (with some type of timestamping and securing) enough? Would you then have to prove that your app was up and running and the website available as well? What are the alternatives?
More posts by @Angela700
2 Comments
Sorted by latest first Latest Oldest Best
Backing up the database isn't enough, as you can't prove the contents of the database were displayed on the pages.
If the dynamically generated pages look the same for each user, and don't depend on picking options for lists, say, which then generate pages depending on those options, then you can use spidering software. It will take a snapshot of the pages as they appeared when it looked.
I use wget for this sort of thing. It's a command line tool, with a scary number of options. However the advantage of a command line tool is that you can run it automatically as much as you want. To get you going, here's how I use it to get a snapshot of a site:
"c:program fileswgetwget" -k -p -r -X video -w 1 example.com
video is a directory I don't want to get a snapshot of.
-w 1 means wait a second between grabbing each page, so I don't hammer the site.
-k means convert the links in the downloaded files so that they work when you open up those files again and don't go back to the original website
-p downloads all files used on a page, e.g. images
-r means recursive, so it follows all links which are on the site
What I would do, in case I wanted to archive everything I publish on my website is to actually write the content into a file that looks like the dynamically created site. Then I would make weekly backups of all the files a created.
The easiest way is to make backups of the DB, or even have a second DB where an exact copy will be sent whenever you publish anything.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.