Apr 262012
 

“Statistics means never having to say you’re certain” is a well-known quote that would often come up in my AP Statistics class back in high school. The same can be implied for estimation, and while estimates need not be perfect, they should at least be logical. While estimating the effects of compressing data I recently came across a behavior that I don’t think Mr. Spock would approve of.

The sp_estimate_data_compression_savings procedure is an excellent tool for estimating how compression will affect an index or partition. It works by sampling 5% of the appropriate pages and applies the desired level of compression on them in TempDB. The results are then extrapolated to give an estimated size for the entire index/partition with that level of compression applied. It’s not perfect, but again it’s an estimate. I recently found however that depending on the datatypes present in the table, that estimate can get very bad. I noticed this in SQL Server 2008R2, but it also shows up in 2012.

For this demo, you’ll need the AdventureWorks database or one of its variants. I used AdventureWorks2008R2 and AdventureWorks2012 in my tests. You’ll also need tables a bit larger than AdventureWorks provides, so I use Adam Machanic’s bigAdventure script to create the dbo.bigTransactionHistory table we’ll be working with.

Once dbo.bigTransactionHistory is set up, we’re going to focus on it’s nonclustered index which is created in the script along with the table. Let’s rebuild this index with a fill factor of 100 so there’s as little fragmentation as possible.

ALTER INDEX IX_ProductID_TransactionDate ON dbo.bigTransactionHistory
REBUILD WITH (FILLFACTOR=100);

Now check the estimated size of it using all 3 compression settings: NONE, ROW, and PAGE.

EXEC sp_estimate_data_compression_savings 'dbo','bigTransactionHistory', 2, NULL, NONE;

EXEC sp_estimate_data_compression_savings 'dbo','bigTransactionHistory', 2, NULL, ROW;

EXEC sp_estimate_data_compression_savings 'dbo','bigTransactionHistory', 2, NULL, PAGE;

(Click to enlarge)
Screenshot1

The index is currently not compressed, and you can see that the first result set’s estimated size (with no compression) is close to the current size. It won’t be exactly the same and your values will vary, but the main idea here is that using no compression gives an estimate closest to the current value and row and page compression yield smaller values yet.

Now let’s change things up by adding another column to bigTransactionHistory and giving it a value:

ALTER TABLE dbo.bigTransactionHistory ADD vbm VARBINARY(MAX) NULL;

UPDATE dbo.bigTransactionHistory SET vbm = CONVERT(VARBINARY(MAX), '0xAB');

Then check the estimated sizes again just like before:

EXEC sp_estimate_data_compression_savings 'dbo','bigTransactionHistory', 2, NULL, NONE;

EXEC sp_estimate_data_compression_savings 'dbo','bigTransactionHistory', 2, NULL, ROW;

EXEC sp_estimate_data_compression_savings 'dbo','bigTransactionHistory', 2, NULL, PAGE;

Screenshot2

Look at that first result (with no compression) – it’s now way smaller than the current size. It’s actually even smaller than the estimated sizes for row and page compression, and it all seems to be due to the VARBINARY(MAX) column. If you remove that column and rebuild the table, the estimated size with no compression will be back to where it should be.

Strange, huh? I can only assume this has something to do with VARBINARY being a type that doesn’t get compressed, but I don’t see how that should affect the estimation.

BOL states that “If the requested compression setting is same as the current compression setting, the stored procedure will return the estimated size with no data fragmentation and using the existing fill factor.” I can’t see the assumption of no fragmentation being responsible for the estimated size shrinking so much and being smaller than the estimates using compression. Compression can increase the size of an object in certain cases, but the differences here are just too great.

I have submitted this on MS Connect – feel free to vote up if you can reproduce it!

Apr 242012
 

This past weekend I was very fortunate to be able to attend and speak at SQL Saturday #118 in Madison, Wisconsin. This was a top-notch event; all the hard work and planning that went into it really showed in how smoothly things ran.

My trip began around noon on Friday, which was a great decision because I hit no traffic at all on the way up there. I checked into the Crowne Plaza Madison and ended up with a little time to kill before heading out to help set things up. The hotel was excellent, featuring a pool and free wi-fi that was surprisingly fast. I’ve been gouged at other hotels for a much slower connection!

SQLCowA large group of volunteers converged on the venue around 4pm Friday to help with setup, stuff goodie bags, and make sure everything was ready to go. So many showed up that things got done significantly faster than planned, meaning more downtime before the speaker dinner started. Downtime’s never a bad thing at these events though – we all hung out at Ale Asylum, a brew pub with some very tasty beer!

The speaker dinner was held at Benvenuto’s Italian Grill, where much merriment was had, along with some excellent Italian food. Afterwards a lot of us headed back to the hotel bar before turning in for the night.

Doors opened for registration at 7:00 the next morning. The event was held at Madison Area Technical College, which was a terrific venue. There was excellent classroom space featuring large projector screens, plenty of desk space for laptops, and most rooms had tiered seating to help with visibility. The hallways were wide and there was plenty of open spaces to mingle and network with others. Much like the hotel, it also featured some impressively fast wi-fi.

My session, where I talked about data compression, was at 8:00 am – the first timeslot of the day. This was fine with me as I am very much a morning person. Not only was I awake and ready to go, but I had a pretty lively crowd of attendees that were eager to get the day started as well. My presentation filled the 75 minute timeslot perfectly, with enough time for some great questions afterwards.

Lunch was a delicious Wisconsin picnic featuring bratwurst, hamburgers, baked beans, potato salad, and cookies for dessert. To facilitate conversation on specific topics, this was a “Cows of a Spot” lunch, where speakers were assigned to specific tables marked with topics and attendees could come with their questions. I manned a table on performance and had some great conversations with those who stopped by.

The sessions themselves appeared very well-organized, and those I attended personally were of excellent quality. I was particularly impressed by how room monitors were assigned to each room and evaluation forms were given to speakers immediately following their session. Of all the SQL Saturdays I’ve spoken at this was the first time I’ve gotten to see my evaluations. I can remember one where the evaluation forms got lost before we got them, and another where it wasn’t communicated that speakers had to come pick them up so they were thrown out instead. Giving the forms to the speaker immediately following the session eliminated any confusion and made sure the evals got in the right hands as quickly as possible.

After a full day of training, closing ceremonies and the highly anticipated raffle drawings took place. There were some great prizes, including many books, Amazon Kindles, and even a HP Mini! The after party was held at Wilson’s Sports Bar across town. Unfortunately I don’t have a whole lot to say about this, as I left for home shortly after it started. I’m sure an amazing time was had by all, and you’ll be able to find blog posts by others who stayed.

All in all I thought this was an excellent event. While it was the first SQL Saturday in Wisconsin, it sure felt like it was put on by a bunch of seasoned pros. Many thanks to the organizing team, which consisted of Jes Borland (blog | @grrl_geek), Ted Krueger (blog | @onpnt), Gina Meronek (@equerystrian), Leonard Murphy (blog | @phonetictalk), and the MADPASS Board: Tim Benninghoff (blog), Matt Cherwin (@control_group), Tony Sebion (@tonysebion), and Steve Novoselac (blog | @scaleovenstove). Also a huge thanks to the sponsors – without their support it wouldn’t have been possible.

Apr 122012
 

Every once in a while I find myself building a report in SQL Server Reporting Services. I know my way around SSRS enough to get a report looking the way I like, but I’m definitely not an expert in the subject. While developing a report recently I noticed some very strange behavior that I have a hard time believing is intended: SSRS is changing font sizes on me depending on what I’m viewing! I’m happy to provide screenshots and instructions to reproduce this behavior, and also the solutions I found. If you’re curious, I first observed this in SQL Server 2008 R2 SP1 CU2 (10.50.2772), though I’m guessing it appears in earlier builds as well. It is still present in the latest build, SQL Server 2008 R2 SP1 CU5 (10.50.2806).

Let’s start off by creating a very simple table about soda pop soft drinks in a test database (mine is named “Sandbox”) and fill it with some data:

USE [Sandbox]
GO

CREATE TABLE [dbo].[SoftDrinks](
	[Name] [varchar](50) NOT NULL PRIMARY KEY,
	[Style] [varchar](50) NOT NULL,
	[CaffeineLevel] [varchar](10) NULL
);
GO

INSERT INTO [dbo].[SoftDrinks] VALUES (N'50/50', N'Citrus', NULL);
INSERT INTO [dbo].[SoftDrinks] VALUES (N'7 Up', N'Citrus', NULL);
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Coca-Cola Classic', N'Cola', N'Medium');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Diet Pepsi', N'Diet', N'Medium');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Diet Rite', N'Diet', N'Medium');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Jolt', N'Cola', N'High');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Mello Yello', N'Citrus', N'High');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Mountain Dew', N'Citrus', N'High');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'Pepsi', N'Cola', N'Medium');
INSERT INTO [dbo].[SoftDrinks] VALUES (N'RC Cola', N'Cola', N'Medium');

Selecting from the table should now get you this:
Query Result

I should add that the values in the CaffeineLevel column are completely made up. I just wanted a few different values with some NULLs mixed in.

Now that the table is set up, we need a report. To briefly describe the report I created, I’m using the List object with text boxes for Name and CaffeineLevel in the detail section. The report is grouped by Style and the groups are expandable and closed by default. The name of each group also displays a count of the number of records it contains. (Click to enlarge).
Visual Studio

To make things a bit quicker, you can download the report project here. It’s setup to use Windows authentication to connect to the Sandbox database on localhost. You may need to make adjustments to the connection information depending on your test environment. I wasn’t able to observe this behavior when previewing in Business Intelligence Development Studio, so you will need to deploy it to a report server.

Once the report is deployed, run it and you should see the 3 Style values with counts, however the details aren’t visible:
Report

Open the Cola and Diet groups and you’ll see the values for Name and CaffeineLevel. Everything looks fine, right?
Report

Now open the Citrus group. Notice how the size of the text in the CaffeineLevel column shrinks?
Report

Closing the Citrus group will return the text sizes for the other groups to normal. For fun, let’s try this again with a non-null value for the first row of the CaffeineLevel column. Add another row to the SoftDrinks table:

INSERT [dbo].[SoftDrinks] ([Name], [Style], [CaffeineLevel])
VALUES (N'123_NotNull', N'Citrus', N'Low');

Refresh the report and open all groups. You’ll see the font size doesn’t change, which should be expected.
Report

It appears that the cause of this is related to the top value in the column. If the top value is null, the font size gets smaller.

Solutions

I found a couple of ways to resolve this. The first is to remove the NULL values from the CaffeineLevel column by adjusting the query. I used the ISNULL() function and tried replacing the NULL values with an empty string, but that had no effect. Replacing null values with a single space did fix the problem, however.

SELECT
   Name,
   Style,
   --CaffeineLevel -- (The original query)
   --ISNULL(CaffeineLevel,'') AS CaffeineLevel -- (This doesn't work)
   ISNULL(CaffeineLevel,' ') AS CaffeineLevel -- (This works)
FROM dbo.SoftDrinks;

Another solution is to filter out the NULL values by using an expression in SSRS. For this I used the IsNothing() function to replace NULL values with a string containing a single space, which resolved the issue. Much like above, replacing NULL values with an empty string had no effect. The expression I used was:
=IIF(IsNothing(Fields!CaffeineLevel.Value)," ",Fields!CaffeineLevel.Value)

Whichever of the above you choose, you’ll arive at the correct (and visually pleasing) result of having the font sizes be the same:
Report

Personally I prefer to alter the query rather than add an SSRS expression, as I am not a fan of seeing <<expr>> in report definitions when it’s not truly necessary.

If anyone else has encountered this situation, I hope you find this solution helpful!