Mobile app version of vmapp.org
Login or Join
YK1175434

: How often should a website's disaster recovery plan be tested? We all take backups hoping to never use them, but of course some day we are going to. I was wondering if there has been any

@YK1175434

Posted in: #Backups #BestPractices #DisasterRecovery

We all take backups hoping to never use them, but of course some day we are going to.

I was wondering if there has been any best practice guide written about how often a disaster recovery plan should be tested.

At the moment I do dry runs every so often when I am feeling particularly paranoid, but there is no schedule and nothing is automated.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @YK1175434

3 Comments

Sorted by latest first Latest Oldest Best

 

@Karen161

As for almost any thing in the world, there are some trade offs you need to deal before deciding what is the appropriate approach.

First thing is to evaluate your business and infrastructure to elaborate a risk management plan (disaster recovery and business continuity plan are items inside this one). Answerig questions like


"What is the money loss if the website goes of for 10 minutes? For 1 hour? 12 hours?"
"What is the reputation loss?"
"What are the major impacts other financial and reputational?"
"How much data can I recover? Are there an redundancies for database? For the server?"
"Will configuration management and versioning help me out? How well are they?"


Other than this, you need to plan disaster recovery in some levels. If it is webserver crash, a cracker invasion, a flood/earthquake, national backbone failure, a powerdown...

As you can see some risks can just be mitigated, some will demand a lot of work, and some you just can't act directly.

After all evaluations, you can determine which tests would you do and how often. Whenever a point change, you should test it actively, eg, if you buy a new server, you should test the power down, a crash... When everything is pretty the same but data on database, a fortnightly or monthly backup test is enough (if risk management allows it). Major corps usually simulate this once a quarter, as well as everything else.

Just one point... you should work on automating what is possible to automate. If your risk management plan points that going out implies in a massive loss, then makes sense to speed up through automation as much as possible to minimize that impact.

10% popularity Vote Up Vote Down


 

@Becky754

I'm gonna have to go with "it depends" on this one, but more on that later.
I think (and it's a fairly subjective question) that it matters more what you test, not how often you test it. As Catcall said you might have to deal with having to set up a completely new server (obviously not if you're using shared hosting, or in general if you don't run your own server). So you need to make sure that everything can be brought back fairly quickly. It obviously doesn't help you any if you can restore every page you host from a backup within minutes if the whole server won't run/doesn't have the mods you need for the site/doesn't have the database required/etc. So you might restore everything that was shared with the world, but you're missing a weird plugin or include that all the pages needed and you're left figuring out what's missing, how to install it, and how to go from there.

After you have tested bringing back the currently working environment and restoring data from the backup I really don't think you need to test it too often. I would just make sure to test it once anything at all changes on the system (new version of whatever servers are running, new hardware, new operating system, new database version, etc. etc.).

As far as the "it depends" goes.... What kinda website are you running ? Obviously it doesn't make sense to invest hundreds of hours into testing the backup solution if you're hosting a small website for 100 visitors a month. If a down time of 12 hours severely cuts into your profit you should probably test at least once a month, though.

10% popularity Vote Up Vote Down


 

@Cugini213

Your disaster recovery plan should be tested often enough that it will work correctly 100% of the time should you have a disaster it's supposed to cope with. That means when you're woken up in the middle of the night, given no caffeine, screamed at during the whole process, and so on.

My plan is supposed to cope with someone breaking in and stealing the server, but I'm not expected to cope with that in just two hours in the middle of the night. I'm allowed time to buy a new server locally, reload the OS, install a spare tape drive, and restore from tape.

I rarely run a full restore. But I test my ability to restore from tape, including my ability to restore really large files, several times a week. I actually restore files, and compare md5 checksums.

The main web site I take care of (just a small part of my job) doesn't need to be restorable from tape, but the server it's on does. The web site builds from a subversion repository, and I can rebuild it from the sources in about 20 minutes after bringing the server up.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme