Test Your Restores. TEST. YOUR. RESTORES.
In the past few months I've had not one but two shining examples of how having a backup strategy meant absolutely nothing when it came time to restore from a disaster. All the backup planning in the world is completely useless if you can’t successfully restore your backups.
The Ransomware
I had a client that was hit by a ransomware virus, encrypting several of their systems including the database server. Not to worry, though, they had "full backups" of all the affected machines, done by a third-party backup utility. After taking a day to cleanse their network, they restored these backups onto their servers. Now it was just a simple matter of bringing all the applications back online, right? Well, not exactly…
Looking at the database server in particular, all of the files were present after the restore so no data was lost. But even though all files were in their proper locations, Windows wasn’t recognizing SQL Server (or any other application) as being installed. It’s like the backup application they were using copied all of the files, but not the Windows configuration or registry. It turned out that the backup utility wasn't configured correctly. They also had no SQL Server native backups, because there was no need for them with that fancy utility and all…
After trying in vain to fix this problem, they ended up having to re-install Windows and all the applications on their servers. No data was lost, but the company was essentially out of business for several days longer than it needed to be.
The Storage Failure
I was contacted about a second issue that was eerily similar. A database server suffered a disk controller failure, and all its data was lost. But never fear, IT used a backup application and had a strict policy that ensured backups of all files were intact, including SQL Server data and log files. With replacement hardware stood up, it was time to re-attach the database files from backup and bring the databases online. Piece of cake, right?
Once again, not exactly. The third-party backup software used was either not application-aware, or was not configured to be. Either way, the data and log files were not matching up, and the database could not be brought online. It couldn’t even be placed in emergency mode. After spending about a day attempting to restore, they ended up going to an older backup that did work. About a month of data was lost in the process.
What Caused Both These Problems?
I think it’s very important to note that both of these incidents involved backups that were completing successfully. The failure here was that nobody ever tried restoring these backups. If restores had been attempted, it should have become patently obvious that there would be problems restoring them. Even when using SQL Server native backups (which I fully believe would have helped in both these cases), it's important to test and make sure they can be restored.
Remember, a backup is worthless if you can’t restore it. TEST YOUR RESTORES. Be familiar with restore procedures, and have them well-documented and in a safe place.
Oh, did I mention you should test your restores?