Data compression features in SQL Server 2008 October 10, 2011Posted by fofo in Sql Server, SQL Server 2008.
Tags: Backup Compression, Data Compression, Page-level Data compression, Row-level data compression
In this post I would like to talk about data compression/backup compression features available to us in SQL Server 2008 and R2 editions.
I will talk about Data Compression and Backup Compression.
Data Compression is only available in Enterprise (SQL Server 2008 & SQL Server 2008 R2) edition or higher, not in the Standard edition for SQL Server 2008/ R2.
Backup compression was introduced in SQL Server 2008 Enterprise edition.Backup compression is available in the SQL Server 2008 Standard edition R2.
I know there are samples of how to use data compression/backup compression in BOL,MSDN and elsewhere but I thought that it would be a good idea to give it a try with hands-on examples.
Many people have found useful my style of posting. They feel they can follow the steps and master the technology.
We need an instance of SQL Server running in our machine. I have SQL Server 2008 R2 Developer edition installed in my machine.
In case you do not have an instance of Enterprise edition or Developer edition of SQL Server 2008 R2, you can download it from here . This is a trial version.
We need a database as well. I will use the Northwind database.
If you need the installation scripts for the sample Northwind database, click here
One thing is definetely true, Data grows. That means the size of our databases grow.That means that the size of the backups grow.So with data compression not only we reduce the size of the physical files but we also reduce the disk I/O. We can also take compressed backups and reduce the amount of disk space our backups take.
We cannot apply data compression on an entire database.Data compression can only be used for these database objects:
• A table stored as a heap
• A table stored as a clustered index
• A non-clustered index
• An indexed view
• Partitioned tables and indexes
As experienced or novice DBAs we must evaluate the objects in the database that should be compressed and what type of compression we should use.
We have Row-level data compression which takes fixed length data types and converts them into variable length data types.Zero and null values are also ignored. By row level data compression we have more rows fitted in a single page.
Let’s see an example.I will use the tempdb database. We will also use the sp_spaceused system stored procedure which displays the number of rows, disk space reserved, and disk space used by a table.In this case through Row-level compression we have two CHAR (100) data types turned into variable length data types. The characters stored are only the ones that “make up the data” , so blank characters are not stored.
1) Launch SQL Server Management Studio and connect to the local instance of SQL Server 2008/R2 Enterprise or Developer edition.
I create the table and populate it with some values and then I use the sp_spaceused system stored procedure. Execute the code up to line 15 and see the results.
USE tempdb GO CREATE TABLE MyCompressionTable (ID INT, phrase1 CHAR(100),phrase2 CHAR(100)) GO INSERT INTO MyCompressionTable VALUES (1, 'SQL Server is the best RDBMS','I love T-SQL') GO 15000 SELECT * FROM MyCompressionTable EXEC sp_spaceused MyCompressionTable GO -- DATA_COMPRESSION = ROW ALTER TABLE MyCompressionTable REBUILD WITH (DATA_COMPRESSION = ROW); GO EXEC sp_spaceused MyCompressionTable GO
Have a look at the picture below.
Now execute the lines 18-23. In those lines I apply row compression on the table and then I use the sp_spaceused system stored procedure again. Execute the code in the lines 18-23 and see the results. Notice the difference in size of the table. Have a look at the picture below.
We have another form of data compression, Page-level Data compression. This is a more sophisticated,advanced form of compression which combines row-level data compression with prefix and dictionary compression algorithms. With this type of compression we have increased data compression over row level compression.
I won’t go into details on dictionary algorithms but their main idea is to search for repeated patterns of data in columns and then place that data in a special “place” in the page. Then obviously there is some reference from each row to this “place” in the page where the common value is stored.
Let’s say we have a column e.g FootballTeamName with values
Then through column prefix phase the “United” common value is detected and is stored in an anchor record and then there is a reference to this anchor record from the column.
If you need to learn more about those algorithms and Data compression in general have a look at this whitepaper.
In this code we alter the Data compression from Row to Page level and then execute the sp_spaceused stored procedure.Execute the code and see the results.
ALTER TABLE TestCompression REBUILD WITH (DATA_COMPRESSION = PAGE); GO EXEC sp_spaceused TestCompression GO
Have a look at the picture below.You can compare the differences in the size of the table when the different forms of data compression are applied and when there was no data compression applied to the table.
I would like to explain a few things about how Data compression works internally.I know a lot of people who are totally confused on how SQL Server handles internally data compression. You might ask, “Do I need to know this?”.
Well, the answer is no really….I mean, you do not have to know how the Iphone works internally in order to use it.
Having said that it is useful sometimes to dive into the inner workings of the Storage Engine, the relational engine and the SQL OS.When we compress tha data that means that the Storage Engine compresses the data and stores it.
When the Storage Engine passes the data back to us through Relational Engine, it has to uncompress it.Obviously we need more CPU power to do that. Well, always keep in mind that from all the subsystems that SQL Server uses CPU is the faster one.
By data compression the amount of I/O is reduced and that is the important thing.If we want to update a row that is stored in the disk and is compressed using row-level compression, the row must be brought to memory so the update takes place. The Storage Engine gets the compressed data from disk and brings it into memory(that means more data is stored in the buffer cache and that means less I/O).
Then the Relational Engine steps up. But before performing its magic (executing the Update statement) the row is uncompressed by the Storage Engine.When the updates takes place the row is handed back to the Storage Engine where it is compressed again. At some point during a CHECKPOINT operation the row is stored to the disk again always in compressed format.
Now let’s see how to perform data compression using SSMS. We want a database, a table and lots of data in it. I will create a database “testdata” and a table called “ThePerson”
There will be 20 thousand rows of data in the table. I used the SQL Data Generator tool from Red Gate.You can download the script that creates the database and the table and populates the table from here.
Open the script in a new query window and execute it.
Now that you have the database and the table ready we can use the SSMS to perform data compression. Select the table and right click on it. From the popup menu select Storage — > Manage Partition. The Data Compression Wizard is shown.Click Next. Then select the compression type (select Row) and then press Calculate.This allow us to see how much space will be saved once compression has been turned on.
Have a look at the picture below
Now choose from the drop-down menu under Compression Type the option Page and then press Calculate.
Have a look at the picture below.
I will choose to go with the “Page” compression since it offers me greater compression.So we now, have to click the Next button.I will choose to create a script and script it in a new query window.Then I click “Next” again and then the “Finish” button.I chose to save the script in a new query window so I can execute it later when there is no so much activity in the server.By that I mean since this is a heavy CPU related task, leave it until a time where you server has less workload to process.
Have a look at the picture below to see a snapshot of this step.
This is the T-SQL code that was generated in my query window
USE [testdata] ALTER TABLE [dbo].[ThePerson] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE )
We can use the sp_estimate_data_compression_savings system stored procedure to find out the estimated storage space savings. In our case we can use the following T-SQL.
USE testdata GO EXEC sp_estimate_data_compression_savings 'dbo', 'ThePerson', NULL, NULL, 'ROW' ; GO EXEC sp_estimate_data_compression_savings 'dbo', 'ThePerson', NULL, NULL, 'PAGE' ; GO
There is another free tool, very useful, that is called SQL Server Compression Estimator (SSCE) and you can download it from here.
This is a tool that lets you quickly and easily to determine the best compression algorithm for a database with many objects. It works for SQL Server 2008 and later.
After you download it and install it you can use it. I provide some snapshots on how to use it. It is very easy to use.First you connect to the instance of SQL Server.
Then we need to select a database , decide the percentage of compression we would like and decide whether Index Maintenance Ratio will be calculated.
Then the tool will process all the tables in the database calculating the potential compression savings for each one
We have the buttons “Save Results” that will save the entire contents of the results window to a comma separated value (CSV) file. We have the button “Create Script” that will create a Transact-SQL script.Have a look at the picture below.
This is a very nice tool that let us quickly and easily implement data compression in a database with a large number of objects.
Now let’s move on to Backup Compression.Backup compression is available in the SQL Server 2008 Standard edition R2.
When we compress backups we save backup space and also reduce backup times.Obviously we will need more CPU resources to perform compression during a backup or decompression during a restore.One thing to keep in mind is that if we have databases that use data compression or are encrypted, backup compression will not be very efficient. Backup compression does not use row-level or page-level data compression. Backup compression happens at the time of a backup, and it uses its own compression algorithms.Let’s see an example. We will back up the Northwind database without back up compression and then we will use backup compression
- without backup compression
We will select Northwind database then Tasks –> Back up.
Have a look at the picture below.
- with backup compression
We will select Northwind database then Tasks –> Back up and then Options.Choose Compress backup.
We can set at the server level, the option to have backup compression on by default for every database on the server.This is the T-SQL code to achieve that.
EXEC sp_configure 'Show Advanced Options', 1 GO RECONFIGURE GO -- See the current values EXEC sp_configure -- Turn on backup compression by default EXEC sp_configure 'backup compression default', 1 GO RECONFIGURE GO
Now we have two backups. One compressed backup of the Northwind database and one uncompressed.
Have a look at the picture below to see the difference in size of the backup files when using compressed backups and when there is not a compressed backup.There is a notable difference.
When restoring a compressed backup it will uncompress itself automatically. We compress backups using the Enterprise Edition of SQL Server 2008 and the Standard edition of SQL Server 2008 R2 but we can restore compressed backups to any edition of SQL Server 2008. You cannot restore a compressed SQL Server 2008 backup to any previous versions of SQL Server.
That is all folks!!!
Hope it helps!!!