Disaster Recovery - What Will You Do When Your Drupal Site Really Breaks?
Disasters Happen - It's Just a Question of When
Spurred by a completely broken server at one of our clients we are re-evaluating our own backups and disaster recovery plans. As we all know, most people give little thought to backups, and even less to how to use them for recovery. In the meantime our servers quietly hum along and then act up at the most inopportune time. It is not a question whether a server will fail or not, it will - it is just a question of when it will fail and what you will do to recover from the failure.
Know the Risks
Your first line of defense is knowing your site, identifying single points of failure, and placing a probability on the failure. Then you figure out how much mitigation you need to reduce your exposure to risk.
Mitigate the Risk
Code and databases are most prone to break or otherwise get corrupted. Fortunately we have easy solutions in code repositories and backups. The harder problem is how you will make use of those tools to recover from a failure, but we will address this later. For now make sure you create backups on a regular basis and that at least some of these backups do not live on the same server. Also make sure that you have backups for everything, the server OS and its configuration, your site code and the database(s).
The Importance of Offsite Backups
Or maybe we should rather call this "the importance of off-server backups". Regardless, you need to absolutely make sure most of the backups are not stored on the server itself. If the server crashes completely and takes its storage with it - believe me, it happens! I have seen it many times, most recently last week - you need those off server backups. So be absolutely sure there are backups elsewhere and recent enough for your comfort level. You will need them some day.
Develop a Recovery Procedure
Having backups and spare parts in hand is well and good, but you want to be able to do something with them. Moreover, when you need them you will be very stressed or worse, not be there yourself. You need a recovery procedure, fully documented and ready to go. Take your time with this, you will be glad you did. Make sure all steps are covered, that someone only vaguely familiar with the setup can understand it, and most importantly, that it contains information on how to access passwords and other security features in a secure manner.
Test the Integrity of Your Backups
Finally, you want to be sure you can actually get a server back with the help of your recovery procedure and necessary parts and backups. At the very least review the recovery procedure once or twice a year and check for accuracy given the current server configuration, and look through your backups to ensure they are readable and able to be unpacked. Better still is to perform the basic recovery steps and thus test whether it all works as planned.