Do your backups run at the right time?
Many articles on database backups reference running database backups and maintenance processes overnight. In lots of situations this is a great idea, especially if you’re dealing with a transactional system that has users connecting to the database during the day and minimal activity outside of business hours. But for systems with different schedules, night time is not always the right time.
While meeting with a client, they mentioned to me that their database backups take around 7-8 hours to complete. Knowing that their hardware is very similar to mine and that I can back up a database that’s 40 times larger than theirs in a fraction of the time, I suggested they let me investigate the issue.
The first thing I noticed when I took a look at their existing backups was that the backup files were more than twice the size of the entire database. A full backup contains a copy of each page of the database as well as the active portion of the transaction log. (For more details on exactly how much of the log is included, see Paul Randal’s blog post on the subject.) It’s not uncommon for a full backup to be slightly larger than the size of the database due to the portion of the log, however what I saw got me thinking that there was an awful lot of active transaction log present in the backup.
I started looking around to see what time the backup was running and what else it was competing with. It turned out that backups were running overnight, and while end users weren’t connected to the database during that time, a rather large bulk insert process was. The backups and bulk inserts in this case were both significantly hindering each other in what I'll lovingly call a "paradox of suckitude":
- The backup had to include all of the transaction log being generated by the ongoing load process, which made the backup file very large, and also made the backup process take longer since it had to copy all that extra data.
- The bulk insert process was affected not only by all the extra IO generated by the backup, but also by the fact that minimal logging is disabled while a backup is taking place.
There were plenty of other steps that could be taken to increase the speed of their backups, such as striping the backup across multiple files and volumes, utilizing backup compression, and adding additional database files, but before playing with any of that I decided to tackle the timing issue first. I changed the backup schedule so that backups would run during the middle of the workday. Sure the backups would be competing with user activity, but testing showed their impact on database performance to be minimal. Simply changing the schedule caused the backup to finish 4x faster, and the bulk insert process sped up by a factor of 10.
It should come as no surprise that scheduling is very important, and even if a schedule makes sense when it's initially created, things can change over time. Job schedules are definitely something that should be periodically reviewed to make sure that simple conflicts aren't robbing you of performance.