Why I Hate Row Compression
This post is part of T-SQL Tuesday #52, which is being hosted this month by Michael J. Swart (@MJSwart). Michael is asking us to argue against a popular opinion, and I'm more than happy to do so, as this is a belief that I've kept to myself for quite a while.
SQL Server's row compression feature can be an amazing tool. Not only is it lightweight on CPU usage, especially when compared to page compression, but it can save a significant amount of disk space as well. Your data also remains compressed while in the buffer pool, meaning more rows can be stored in memory, reducing the need to make slower requests to disk. On top of all that, some queries (especially those involving index scans) can see dramatic performance improvements.
In fact, row compression is so good that Microsoft's whitepaper actually states "If row compression results in space savings and the system can accommodate a 10 percent increase in CPU usage, all data should be row-compressed."
Yes, row compression is a wonderful thing, and the databases I maintain frequently benefit from its use.
But I hate it.
Why? Because all too often, features designed to help make things easier also make people lazy.
By far, the biggest issue that row compression addresses is poor data typing, the use of a data type that isn't appropriate for the values at hand. For example, if a column is only going to store the values 1 through 5, an integer data type is not necessary. A tinyint data type would be just as effective, and would consume only one quarter of the space. However if you are unable to change the data type, perhaps because the database is part of an application written by a third party, row compression can be a big help.
Row compression allows the storage engine to treat fixed-width data types as if they were variable-width. This means the disk space that isn't needed to store a value in a fixed-width column and would typically go unused can be put to work. The savings can be tremendous, and SQL Server's data compression features are completely transparent to end users or applications – all the magic happens behind the scenes, which is why a third party application would be none the wiser.
But what if you are able to change those data types, and just don't feel the need to do so anymore? Data compression gives you most of the advantages of proper data typing, but all you have to do to get them is flip a switch – no pesky forethought necessary. And that's why it's terrible. Because for every person out there who designs their databases and data types with care, there are many more who aren't interested, don't care, or don't even realize it's an option. Features like row compression that mask these issues aren't going to interest anyone in solving them the right way.
So while row compression is a wonderful tool and can do amazing things when used properly, don't forget it's also an enabler.