Jun 262014
 

I’ve been told on two occasions that the absence of the MEDIUMINT datatype is a serious shortcoming of SQL Server. If you’re not familiar with MEDIUMINT, it’s an integer type available in MySQL that consumes 3 Bytes and is capable of storing values ranging from -8,388,608 to 8,388,607 (signed) or from 0 to 15,777,215 (unsigned).

While SQL Server has many datatypes, it has nothing that exactly matches MEDIUMINT. In SQL Server, if you wanted to store a value between 32,768 and 2,147,483,647 you would need to use the INT datatype, each of which takes up 4 Bytes. The next smallest integer datatype is SMALLINT, which has a maximum value of 32,767 and only needs 2 Bytes. I’m not sure the lack of a MEDIUMINT datatype is really a shortcoming, but if you find yourself in a situation where it’s necessary to store a significant number of values in the 3 Byte range, I’m going to let you in on a little secret: SQL Server does have 3 Byte integers, they’re just lurking behind the scenes.

While being able to choose a 3 Byte integer may have some value, not having to choose it and reaping the benefits as if you did has even more value. Row Compression can do just that. SQL Server’s integer types (TINYINT, SMALLINT, INT, and BIGINT) are fixed-width, meaning they all consume the same amount of storage space whether or not the value they are storing warrants it. For instance, if a column (perhaps a foreign key reference) is only storing the values 1 through 5, only one Byte is necessary to store those values. TINYINT would be the most appropriate in this case, as it only stores a single Byte. Were that column defined as an INT, it would consume 4 Bytes, three of which would not be necessary. Enabling Row Compression changes all this, because it can free up space that’s not necessary for storing a value in a particular row.

Row compression lets the storage engine vary the storage format of fixed-width types, effectively treating them as if they were variable-width. This means that if an integer column has a row storing the value 5, only one Byte will be used. Similarly if the next row stores the value 500, only two Bytes are necessary for that row. But what if an application is storing the value 40,000? The SMALLINT data type (2 bytes) can only store values up to 32,767, so a third Byte would be necessary to get to 40,000. Can SQL Server store an integer as 3 Bytes internally, or can it only go up to 4 Bytes since the next largest datatype available to users is INT? Fortunately, we can experiment and find out.

To begin, let’s create a database

CREATE DATABASE [RowCompressionTest];
GO
USE [RowCompressionTest]
GO

and add a table

CREATE TABLE dbo.NumberData (
Value INT,
CONSTRAINT PK_NumberData PRIMARY KEY (Value)
);

and populate it with some values

INSERT INTO dbo.NumberData (Value) VALUES (0), (5), (500), (5000), (40000);

Now let’s look at how those values are stored. To view this information, trace flag 3604 must be enabled. This trace flag redirects some output values to the client so we can see them.

DBCC TRACEON (3604);

From here, 2 undocumented and unsupported commands are needed to find what we’re looking for. DBCC IND and DBCC PAGE are both completely safe and used internally by the SQL Server team at Microsoft, however please exercise caution should you decide to use them on a production system. Paul Randal has written great blog posts explaining both DBCC IND and DBCC PAGE in detail, so I won’t duplicate the effort here.

First, use DBCC IND to find the first (and, in this case, the only) data page used by the dbo.NumberData table. The file number is in the PageFID column, the page number is in the PagePID column, and you want the row where PageType = 1. In my case this is page 288, but your page number will probably differ.

DBCC IND ('RowCompressionTest','dbo.NumberData',1);

Next you will use DBCC PAGE to examine the contents of the data page you just found the number of. To do this, substitute your file number and page number in the query below. Mine are 1 and 288, respectively.

DBCC PAGE ('RowCompressionTest',**File_ID**,**Page_ID**,3)

There’s a lot of information returned here, and again please refer to Paul Randal’s posts if you are curious about what it all means. For the purpose of this demonstration, scroll down to where it says:

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4

Value = 5

This is the first column of the first row (slot). The next line down you see it has printed that the value is 5, which is exactly what we inserted. What’s of interest here is that it says the Length is 4, and the physical length is also 4. The length is in Bytes, and since this column was created with the INT type, the length of 4 is indeed correct. That value 5 is using 4 Bytes of storage, even though it could just as easy fit into a single Byte.

Now scroll down to:

Slot 4 Column 1 Offset 0x4 Length 4 Length (physical) 4

Value = 40000

The value 40,000 also has a (logical) length of 4 and a physical length of 4. Even though this value could be stored in 3 Bytes, choosing an INT means it’s taking up 4 Bytes whether we’d like it to or not.

Now let’s see what happens when Row Compression enters the game. Run the statement below to enable it.

ALTER INDEX PK_NumberData ON dbo.NumberData REBUILD WITH (Data_Compression = ROW);

Because the index was rebuilt, all the page numbers change. Now you’ll need to run DBCC IND again to get the file and page number, and plug that into DBCC PAGE.

DBCC IND ('RowCompressionTest','dbo.NumberData',1);
DBCC PAGE ('RowCompressionTest',**File_ID**,**Page_ID**,3)

Scroll down to Slot 1 Column 1 again and you’ll see something like this:

Almost identical to what we saw before, but now the physical length is only 1 instead of 4. This means that with row compression enabled, the value 5 is stored physically (on disk) using a single Byte instead of 4. But what about that 40,000 value?

Slot 4 Column 1 Offset 0x3 Length 4 Length (physical) 3

Value = 40000

The value 40,000 consumes 3 Bytes instead of 4 when row compression is enabled, just like the MEDIUMINT datatype. So if anyone ever gives you a hard time for SQL Server not having a 3-Byte integer type, be sure to tell them that a) they really need to get a life, and b) SQL Server can do it very intelligently behind the scenes!

May 132014
 

This month’s T-SQL Tuesday (the 54th!) is brought to us by Boris Hristov (@BorisHristov), and he’s asking us to write about interviews and hiring. I think many of us have at least one tale of an interview or interaction with a recruiter that’s gone good, bad, or ugly. I know I’ve got plenty of them, two of which I’ve shared before.

This time, I have a story of an interview that went well, but with an epilogue that completely threw me for a loop.

A while back I was contacted by “Shirley” (not their real name), a DBA at a company that had an opening for a Senior SQL Server DBA. They were wondering if I would be interested in applying, and sent a job description that was fairly vague, but with the potential to be something interesting. While perfectly happy in my current position, it never hurts to see what other opportunities are out there. I submitted my resume and was quickly granted an interview.

At the interview I talked to a few different people and got a much better idea of what they were looking for than the job description provided. I also spoke with Shirley and of course got that tour of the office where I saw the kitchen and the room with the Xbox and foosball table. After the tour, I did the standard wrap-up with human resources where they ask things like “tell me about a time where you had to work with others towards a common goal” or “what would you say is your biggest flaw?” They asked what my current salary was. They did not ask what my salary expectations were, but that’s ok because I probably wouldn’t have told them anyway. I tend to answer that question with a statement like “an increase based on my experience and the value I and my skills would bring to your company.” That was pretty much the end of the interview. My opinion at this point was that I wasn’t particularly impressed by the position, but the right amount of money might persuade me (but probably not).

Fast forward a few days, and I got a call extending me an offer. As is my standard practice, I never accept an offer immediately. I always give myself at least one night to think about it and confer with my wife. I considered this offer to be a non-starter: the position didn’t excite me and the pay wasn’t compelling. They offered me more than my current salary, but only a tiny bit more (less than 1%). In some circumstances I might have taken it, after all there’s way more to a job than the pay. But this place totally wasn’t worth it.

Having slept on it and talked it over with Michelle, I called their HR manager back and declined the offer. I thanked them for their time, their consideration, and for extending the offer to me, but said I am happier staying where I am at this point. HR was equally polite, thanked me for getting back to them, and wished me well. I’m sure in the back of their mind they weren’t quite so happy because it’s their job to fill positions and now they had to find someone else to offer it to, but (in my mind) a HR professional will never let that show. A job offer is just that, an offer, and the person it is extended to is under no obligation to accept.

That was that. Or so I thought. A while later I got an email from Shirley. All it contained was the following:

“So you did not what [sic] to negotiate with us but simply use it as leverage to get a salary bump? Good luck in the future.”

Wow.

First of all, the “salary bump” comment was totally unfounded, but even if that were the case, it’s none of Shirley’s business.

Second (and more importantly), as an interviewer, I would never ever contact a candidate for any reason, especially in a negative tone like that. I consider communication with candidates outside of the interview room to be completely within the domain of human resources.

I wanted so badly to be the bigger person and just brush it off, but a few hours later I was still taken aback by tone of that email. I forwarded it back to Shirley’s HR and told them I was very sorry Shirley felt the need to send that to me, as their remarks did a rather poor job of representing the company. I added that if that’s the kind of person Shirley is, then I’m very glad I won’t be working with them. HR replied and thanked me for bringing it to their attention. The end.

Why not negotiate?

As wrong as Shirley’s email to me was, they did make a valid point. I certainly could have negotiated and probably would have gotten closer to my desired salary. But I wasn’t interested enough in the job to do so. If you’re not interested in taking a job at any price, there’s nothing to negotiate. Not to mention if you ask for $x and they got the approval to raise the offer to that amount, it would be really unprofessional to say no at that point.

Final Thoughts

As an interviewer, whether individually or as part of a team, you are the face of your employer. In addition to vetting candidates and trying to determine if they would be successful in a position, it’s your job to portray your workplace in the best possible light. If you can’t do that, you probably shouldn’t be taking part in interviews.

Mar 112014
 

T-SQL Tuesday LogoThis post is part of T-SQL Tuesday #52, which is being hosted this month by Michael J. Swart (@MJSwart). Michael is asking us to argue against a popular opinion, and I’m more than happy to do so, as this is a belief that I’ve kept to myself for quite a while.

SQL Server’s row compression feature can be an amazing tool. Not only is it lightweight on CPU usage, especially when compared to page compression, but it can save a significant amount of disk space as well. Your data also remains compressed while in the buffer pool, meaning more rows can be stored in memory, reducing the need to make slower requests to disk. On top of all that, some queries (especially those involving index scans) can see dramatic performance improvements.

In fact, row compression is so good that Microsoft’s whitepaper actually states “If row compression results in space savings and the system can accommodate a 10 percent increase in CPU usage, all data should be row-compressed.”

Yes, row compression is a wonderful thing, and the databases I maintain frequently benefit from its use.

But I hate it.

Why? Because all too often, features designed to help make things easier also make people lazy.

By far, the biggest issue that row compression addresses is poor data typing, the use of a data type that isn’t appropriate for the values at hand. For example, if a column is only going to store the values 1 through 5, an integer data type is not necessary. A tinyint data type would be just as effective, and would consume only one quarter of the space. However if you are unable to change the data type, perhaps because the database is part of an application written by a third party, row compression can be a big help.

Row compression allows the storage engine to treat fixed-width data types as if they were variable-width. This means the disk space that isn’t needed to store a value in a fixed-width column and would typically go unused can be put to work. The savings can be tremendous, and SQL Server’s data compression features are completely transparent to end users or applications – all the magic happens behind the scenes, which is why a third party application would be none the wiser.

But what if you are able to change those data types, and just don’t feel the need to do so anymore? Data compression gives you most of the advantages of proper data typing, but all you have to do to get them is flip a switch – no pesky forethought necessary. And that’s why it’s terrible. Because for every person out there who designs their databases and data types with care, there are many more who aren’t interested, don’t care, or don’t even realize it’s an option. Features like row compression that mask these issues aren’t going to interest anyone in solving them the right way.

So while row compression is a wonderful tool and can do amazing things when used properly, don’t forget it’s also an enabler.

Feb 212014
 

I know most of my important file locations by heart. Whenever I need to do things that require typing out a file’s full path, such as copying a backup or taking a peek at logs, I can type those paths from memory without issue.

On the other hand, when I’m working with systems I’m less familiar with or have never seen before, my memory can’t help me. At this point I’ve always opted for copying and pasting the paths of the file into SSMS or wherever else I might need it. This works just fine, but I’ve always been annoyed that the path and file name need to be selected and copied into the editor separately.

Two rounds of copy and paste seemed a little much. There has to be a better way, and in fact there is. This has probably been present in Windows for quite a while, but I found out about it just this week. I’ve long known that holding down the shift key while right-clicking on a file brings up extra options in the context menu, but what I didn’t realize is that one of those options is “copy as path”.

This will copy the full path of the file, including the name, to the clipboard, and you can now paste it wherever you like.

The Catch

The only downside to this is that, as you can see in the image above, the file path is surrounded in double quotes. This is great for pasting into a command prompt, but you’ll need to replace them with single quotes when working in SSMS.

Jan 312014
 

The ability to keep track of what’s being executed on a SQL Server instance, or to create a read-only copy of a database at a specific point in time can be very valuable if required by the solution you are developing. But a lesser-known fact about SQL Auditing and Database Snapshots is that not only do they work well on their own, but they also play nicely together.

The Setup

Let’s say you work for that awesome company AdventureWorks, which embraces open data so much that you can download their corporate database on CodePlex here. That database contains some super-sensitive payroll data, and they’d like to keep track of everyone who has accessed it. To do this, you set up SQL Auditing to capture all SELECT operations on the HumanResources.EmployeePayHistory table and log them to a file.

-- create a server audit
USE [master]
GO

CREATE SERVER AUDIT [AWAudit]
TO FILE (
FILEPATH = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA')
WITH (QUEUE_DELAY = 1000, ON_FAILURE = CONTINUE);
GO

-- * in a production system you'd probably want a more stringent ON_FAILURE
--   value than "continue", but for demonstration purposes it's fine

ALTER SERVER AUDIT [AWAudit] WITH (state = ON);

-- create a database audit specification
USE [AdventureWorks2012]
GO

CREATE DATABASE AUDIT SPECIFICATION [EmployeePayAuditSpec]
FOR SERVER AUDIT [AWAudit]
ADD (SELECT ON OBJECT::[HumanResources].[EmployeePayHistory] BY [public])
GO

ALTER DATABASE AUDIT SPECIFICATION [EmployeePayAuditSpec] WITH (STATE = ON);

So now anytime anyone selects from that table, there will be a record of it:

-- select from the table being audited
SELECT TOP (100) * FROM HumanResources.EmployeePayHistory;

-- query the audit file
SELECT
	DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_time) AS EventTime,
	session_server_principal_name,
	database_name,
	statement
FROM sys.fn_get_audit_file('C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\*',NULL,NULL);

So far, so good. Now let’s continue with our story and say that management wants to be able to run reports throughout the workday that show the state of the database at 6:00 am. You find the appropriate solution to be a database snapshot, so you setup a job to create a snapshot daily at 6:00 am, and another one to drop it late in the evening. All the reports can then query the snapshot and get the exact state of the database as it was at 6:00 that morning.

-- create snapshot
CREATE DATABASE AdventureWorks2012_Snap ON
(NAME = AdventureWorks2012_Data,
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\AW2012.ss')
AS SNAPSHOT OF AdventureWorks2012;

But what about that audit? Since a snapshot contains absolutely everything in the database at the moment it was created, it will have all that sensitive pay history data as well. How can you audit that?

The Free Lunch

The answer is that you don’t have to worry. A snapshot contains everything in the database at the moment it was created, and that doesn’t just apply to data – that goes for settings too, including audits. If someone was to query an audited table in the snapshot, this will be recorded by the audit just like on the base database. You’ll be able to tell the query was run on the snapshot by looking at the database name column:

-- query snapshot
USE [AdventureWorks2012_Snap]
GO

SELECT TOP (100) * FROM HumanResources.EmployeePayHistory;

-- query audit file
SELECT
	DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_time) AS EventTime,
	session_server_principal_name,
	database_name,
	statement
FROM sys.fn_get_audit_file('C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\*',NULL,NULL);

But what if you don’t want the snapshot to be audited?

The Free Lunch is at Grandma’s House

I can only speak from personal experience, but whenever I go visit my grandmother, I go knowing that I will be eating whether I’m hungry or not. She loves cooking and watching her grandchildren eat, and – let’s face it – there’s just no way to say no to grandma.

Similarly, there’s no way to say no to a snapshot. However the database was configured the moment the snapshot was created will be reflected in the snapshot as well. It’s important to remember that database snapshots are read-only, so even if you try to disable the audit on a snapshot, it won’t work:

USE [AdventureWorks2012_Snap]
GO

ALTER DATABASE AUDIT SPECIFICATION [EmployeePayAuditSpec] WITH (STATE = OFF);

The only way to create a snapshot without the audit in this case would be to stop the audit, create the snapshot, and then restart the audit. This raises other issues, namely that you probably shouldn’t be stopping an audit of sensitive data like that.