: How can I archive (for historical record) my website? If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some

Posted in: #Backups #Database #WebCrawlers

If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some sort of record to prove that this data had been published (on that date), that I can use in the future, what are the options?

I have seen a commercial web archiving service that is charging per page (!). I understand that you can run software yourself on the server but i am unsure how this works with dynamically generated pages. Is backing up the database (with some type of timestamping and securing) enough? Would you then have to prove that your app was up and running and the website available as well? What are the alternatives?

10.02% popularity Vote Up Vote Down

: Thousands of 404 errors in Google Webmaster Tools Because of a former error in our ASP.Net application, created by my predecessor and undiscovered for a long time, thousands of wrong URLs where

@Angela700

Posted in: #Google #GoogleSearchConsole #Links

6 Comments

: Matt Cutts of Google has basically stated, in reference to the 2011 Panda update, that subdirectories are treated as separate sites by the search engine. Your increase in traffic is probably

@Angela700

0 Comments

: Blogger multiple custom domains? I have two domains I'd like to use for the same blogger page (some-example.bloger.com) on two sites. One is blog.example.ca, the other domain is blog.meta-example.com

@Angela700

Posted in: #Blogger

2 Comments

: Quick wiki solution with inline editing [and AJAX]? I am trying to set up a site which will implement a fast, lightweight wiki engine: double click anywhere to append or change text press

@Angela700

Posted in: #Ajax #LookingForAScript #Wiki

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Candy875

Backing up the database isn't enough, as you can't prove the contents of the database were displayed on the pages.

If the dynamically generated pages look the same for each user, and don't depend on picking options for lists, say, which then generate pages depending on those options, then you can use spidering software. It will take a snapshot of the pages as they appeared when it looked.

I use wget for this sort of thing. It's a command line tool, with a scary number of options. However the advantage of a command line tool is that you can run it automatically as much as you want. To get you going, here's how I use it to get a snapshot of a site:

"c:program fileswgetwget" -k -p -r -X video -w 1 example.com

video is a directory I don't want to get a snapshot of.
-w 1 means wait a second between grabbing each page, so I don't hammer the site.
-k means convert the links in the downloaded files so that they work when you open up those files again and don't go back to the original website
-p downloads all files used on a page, e.g. images
-r means recursive, so it follows all links which are on the site

10% popularity Vote Up Vote Down

@Frith620

What I would do, in case I wanted to archive everything I publish on my website is to actually write the content into a file that looks like the dynamically created site. Then I would make weekly backups of all the files a created.

The easiest way is to make backups of the DB, or even have a second DB where an exact copy will be sent whenever you publish anything.

10% popularity Vote Up Vote Down

Feed

: How can I archive (for historical record) my website? If I launch a website that displays 1000's of pages of content (text) delivered dynamically from data in a database and and I want some

More posts by @Angela700

: Thousands of 404 errors in Google Webmaster Tools Because of a former error in our ASP.Net application, created by my predecessor and undiscovered for a long time, thousands of wrong URLs where

: Matt Cutts of Google has basically stated, in reference to the 2011 Panda update, that subdirectories are treated as separate sites by the search engine. Your increase in traffic is probably

: Blogger multiple custom domains? I have two domains I'd like to use for the same blogger page (some-example.bloger.com) on two sites. One is blog.example.ca, the other domain is blog.meta-example.com

: Quick wiki solution with inline editing [and AJAX]? I am trying to set up a site which will implement a fast, lightweight wiki engine: double click anywhere to append or change text press

Login to post a comment!

2 Comments

Back to top | Use Dark Theme