Mobile app version of vmapp.org
Login or Join
Pope3001725

: Planning for catastrophe I work for a small marketing company that also does web design and development. We host all of our web design and development customers on a dedicated server at Hostgator.

@Pope3001725

Posted in: #Backups #Planning

I work for a small marketing company that also does web design and development. We host all of our web design and development customers on a dedicated server at Hostgator. We have a dedicated server with RAID 1 configured hard drives. We also do weekly backups which is automated through cPanel and downloaded by automated FTP software locally.

Today we were discussing what would we do if Hostgator had a catastrophic failure of some kind. It could be the server exploded, Hostgator had serious network issues, the FBI did one of their famous "take every server we see" raids, etc. Basically any scenario where an extended outage is expected. We then took it to the next level and wondered what would we do if Hostgator had an extended outage and we were unable to access our local backups. This could be due to fire, flood, etc. I know the odds of our server being down for an extended peiod of time and our local files simultaneously being inaccessible are remote but all it takes is just two bad things to happen and that's where we would stand. (If you have ever gotten a flat tire and found out your spare was flat or missing you know how easy it is for two bad things to happen simultaneously really is).

Needless to say we want to be prepared for "worst case scenario" type events as this would almost certainly put us out of business. So my two questions are:


What could we do to be prepared for an extended outage by Hostgator? An ideal scenario will have our clients' websites, and hopefully emails, up and running again quickly.
What would a robust back up plan include so important data is never lost? An ideal solution will be automated.


You can assume cost is not an issue in your answers but the more affordable a solutions is, the better.

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Pope3001725

4 Comments

Sorted by latest first Latest Oldest Best

 

@Chiappetta492

Disaster recovery can be a huge task, especially when dealing with multiple servers, sites, and databases. Two key items to take into account with the solution you select are recovery time objectives (RTOs) and recovery point objectives (RPOs).

RTO is essentially the expectation of how long it should take until the sites are back up. If you have an RTO of a minute or two (or less), then you should be considering a solution in line with what Nick suggested that involves real time replication of your files and data to an secondary data center and automatic failover of DNS which could be done with a paid service or with hardware at both data centers (such as the BIG-IP Global Traffic Manager from F5 Networks. This can get costly, but largely depends on answering the question "What is the cost of downtime?" If your RTO is a few hours or even a few days, then you can consider disaster recovery procedures that may involve more manual involvement such as bringing servers online, switching DNS, etc. Tedious, but certainly cost effective if your RTO allows for that.

RPO is basically how frequently backups are done and how much data you are willing to lose in the event of a disaster. If changes to content and/or data happen frequently, then you are likely to have an RPO of maybe minutes or hours and may be dealing with real time replication or high frequency backups. If content doesn't change that often or you have customers that don't necessarily care that they lose data for a few days, your backups can happen less often.

As I mentioned, I agree with much of what Nick had to say. Another alternative you may wish to consider is to utilize cloud based services from one of the larger cloud based providers such as Rackspace or Amazon. Both of those providers in particular have massive infrastructure in place to be able to handle just about any disaster thrown at them. With something like a cloud site or cloud server (terms used by Rackspace), you have the advantage of being able to scale as well and don't have to necessarily worry about the physical hardware aspect of it.

Rackspace also has custom options available where you can intermix your infrastructure, having a combination of cloud servers, physical servers, and cloud files as part of your solution. A hybrid approach may be something to consider depending on your customer needs if you don't want to take a one size fits all approach.

If it helps, there is a page dedicated to disaster recovery on the Rackspace site as well which can be found here. (Also for the record, I am not affiliated with Rackspace, but have used their services in the past).

Hope this has helped.

EDIT: Thought this might help if you are evaluating cloud solutions. The Gartner Magic Quadrant Report for Infrastructure and as a Service and Web Hosting may give you some insight into other solution providers.

10% popularity Vote Up Vote Down


 

@Harper822

I'd suggest that you:


Automatically mirror the entire contents and configuration of your main server to a secondary backup server on a completely separate network in a different data centre. Use RSync, FXP, cPanel voodoo, or whatever method you wish to automate syncing.
Use DNS failover switching to automatically route traffic to the backup server should the Hostgator server prove unresponsive.


This means that you constantly have a 'hot' backup waiting to go should the worst happen, rather than a 'cold' backup that requires manual intervention and much scrambling around and panicking. It also means that your clients will never know that their site went down before you did, which can be distressing for everyone.

You can set up failover DNS using a provider such as DNS Made Easy. For each domain you're hosting, you would set up up to five backup IP addresses, one for each of your backup servers. Once that's done...


DNS Made Easy checks your primary server ever two-to-four minutes and, if it doesn't detect a response, it routes traffic to the secondary IP address.
DNS Made Easy continues to check the primary server. When it comes up, it will reroute traffic to the first server, or—if you prefer—keep it at the backup while you diagnose what went wrong and fix the primary server.


Of course, this solution will raise your operating costs, which you'll have to pass on to clients somehow, but—if you're in an industry where downtime would put you out of business—paying for a largely redundant server is probably worth it for that one time it saves the company.

Beyond that:

Duplicate, duplicate, duplicate

The more independent backups you have, the better. I store remote backups on a local hard drive, which is mirrored to an external hard drive, to Dropbox, a git repository, and a remote FTP account. Take no chances. Duplicate as much as you can. If you have to restore from a manual backup, it's better to have a choice of five than a choice of one. Paranoia is underrated.

Practise restoring the backups manually

If you've never tried to recover from one of your backups, how do you know that they work? It's worth doing emergency drills to see what would happen should your automated procedures fail.


UPDATE: A few other services I've discovered recently that are worth mentioning in relation to site backup, disaster recovery, and maintaining uptime:


Cloudflare, who provides security and caching features to keep sites up when your server goes down. (They mirror your site and serve it from their globally distributed cache instead of from your server directly.)
Codeguard, who provides automated backups and rollback of website code (FTP only).
Site Auto Backup, who provides automated backups and rollback of website code, email data, and MySQL info via cPanel backups. Note that this is run by Hostgator, so it's not necessarily suitable if you host your site with them as well, but might help others.


Cloudflare in particular looks like it would be useful to avoid downtime and to generally improve site responsiveness.

10% popularity Vote Up Vote Down


 

@Murphy175

Make sure you are running version control of all your code with a source code repository (SVN or GIT). Are you using SVN or GIT?

You can get an account (free or paid) at a third party repository, like Project Locker, and if you version all of your code while you're working, essentially you have it all backed up to your repository which is on a third location. Thereby further decreasing your chances (almost to nil) of losing all work at once.

You can either perform your SVN commits/checkouts via command line, or via a client like Versions (for Mac) or TortoiseSVN (for Windows).

10% popularity Vote Up Vote Down


 

@Sent6035632

Complete replication of the server at another facility of another hosting company seems the most obvious solution.

Files can be kept in sync with tools like rsync and unison.
SQL backups can be rsynced too and then uploaded to the slave db by scripts.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme