May 082015
 

It’s 10PM*. Do you know who your sysadmin role members are?

SQL Server’s fixed server roles control incredibly broad permissions across your entire instance. For this reason, membership in these roles should be granted carefully and reviewed often. Fortunately, reviewing role membership is extremely easy, so you can always answer the above question better than Homer.

The Easy Way: SSMS GUI

In the SSMS Object Explorer, open the “Security” folder for an instance, and then open “Server Roles”.

You’ll see it displays the 9 fixed server roles. Double-clicking on any role brings up a dialog that shows all its members:

If you hate typing, then this method is for you. Otherwise, you’ll probably like

The Easier, Faster Way: T-SQL

In SQL Server, where there’s a will, there’s a way to do something in T-SQL. And it will probably be better/faster/cheaper too. Here’s the query I use to check server role memberships:

SELECT
   rp.name AS Server_Role,
   mp.name AS Role_Member,
   mp.type_desc AS Member_Type,
   mp.is_disabled AS Login_Is_Disabled
FROM sys.server_role_members srm
INNER JOIN sys.server_principals rp ON srm.role_principal_id = rp.principal_id
INNER JOIN sys.server_principals mp ON srm.member_principal_id = mp.principal_id;

On my test server here at home, it returns the following results:

You can see the 4 columns it gives us: the name of the server-level role, the name of the login, what type of login it is, and whether or not the login is disabled. The SSMS role member dialog shown above gives no indication if a login is disabled or not, so I find that column especially handy in the query.

It’s important to note that this query returns members of all server-level roles at once. This example shows only the members of the sysadmin role because it’s the only one with members on this instance.

What to look for

  • First and foremost, do you recognize all the members of all roles, and should they have that level of permissions?
  • You’ll most likely see windows groups included in this list as well. Do all members of the group belong there?
  • You cannot remove the sa login from the sysadmin role, so I recommend either disabling it or changing its name.

Securityadmin: the most dangerous role

Ask anyone familiar with SQL Server what the most dangerous role is, and they’ll probably say “sysadmin.” While sysadmin has the keys to the kingdom in the sense that its members can do absolutely anything on the server, there is something securityadmin has that sysadmin doesn’t: secrecy.

Members of the securityadmin role can manage any login on the server, including membership in server level roles. This means someone with securityadmin rights can:

  • add themselves to the sysadmin role
  • do whatever dastardly deeds they want
  • and then remove themselves from from sysadmin

If you’re only looking at members of the sysadmin role, you will be none the wiser. When checking server-level roles, remember that securityadmin is every bit as powerful and dangerous as sysadmin. It just might save your bacon!

*It’s always 10PM somewhere

Mar 112014
 

T-SQL Tuesday LogoThis post is part of T-SQL Tuesday #52, which is being hosted this month by Michael J. Swart (@MJSwart). Michael is asking us to argue against a popular opinion, and I’m more than happy to do so, as this is a belief that I’ve kept to myself for quite a while.

SQL Server’s row compression feature can be an amazing tool. Not only is it lightweight on CPU usage, especially when compared to page compression, but it can save a significant amount of disk space as well. Your data also remains compressed while in the buffer pool, meaning more rows can be stored in memory, reducing the need to make slower requests to disk. On top of all that, some queries (especially those involving index scans) can see dramatic performance improvements.

In fact, row compression is so good that Microsoft’s whitepaper actually states “If row compression results in space savings and the system can accommodate a 10 percent increase in CPU usage, all data should be row-compressed.”

Yes, row compression is a wonderful thing, and the databases I maintain frequently benefit from its use.

But I hate it.

Why? Because all too often, features designed to help make things easier also make people lazy.

By far, the biggest issue that row compression addresses is poor data typing, the use of a data type that isn’t appropriate for the values at hand. For example, if a column is only going to store the values 1 through 5, an integer data type is not necessary. A tinyint data type would be just as effective, and would consume only one quarter of the space. However if you are unable to change the data type, perhaps because the database is part of an application written by a third party, row compression can be a big help.

Row compression allows the storage engine to treat fixed-width data types as if they were variable-width. This means the disk space that isn’t needed to store a value in a fixed-width column and would typically go unused can be put to work. The savings can be tremendous, and SQL Server’s data compression features are completely transparent to end users or applications – all the magic happens behind the scenes, which is why a third party application would be none the wiser.

But what if you are able to change those data types, and just don’t feel the need to do so anymore? Data compression gives you most of the advantages of proper data typing, but all you have to do to get them is flip a switch – no pesky forethought necessary. And that’s why it’s terrible. Because for every person out there who designs their databases and data types with care, there are many more who aren’t interested, don’t care, or don’t even realize it’s an option. Features like row compression that mask these issues aren’t going to interest anyone in solving them the right way.

So while row compression is a wonderful tool and can do amazing things when used properly, don’t forget it’s also an enabler.

Jan 312014
 

The ability to keep track of what’s being executed on a SQL Server instance, or to create a read-only copy of a database at a specific point in time can be very valuable if required by the solution you are developing. But a lesser-known fact about SQL Auditing and Database Snapshots is that not only do they work well on their own, but they also play nicely together.

The Setup

Let’s say you work for that awesome company AdventureWorks, which embraces open data so much that you can download their corporate database on CodePlex here. That database contains some super-sensitive payroll data, and they’d like to keep track of everyone who has accessed it. To do this, you set up SQL Auditing to capture all SELECT operations on the HumanResources.EmployeePayHistory table and log them to a file.

-- create a server audit
USE [master]
GO

CREATE SERVER AUDIT [AWAudit]
TO FILE (
FILEPATH = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA')
WITH (QUEUE_DELAY = 1000, ON_FAILURE = CONTINUE);
GO

-- * in a production system you'd probably want a more stringent ON_FAILURE
--   value than "continue", but for demonstration purposes it's fine

ALTER SERVER AUDIT [AWAudit] WITH (state = ON);

-- create a database audit specification
USE [AdventureWorks2012]
GO

CREATE DATABASE AUDIT SPECIFICATION [EmployeePayAuditSpec]
FOR SERVER AUDIT [AWAudit]
ADD (SELECT ON OBJECT::[HumanResources].[EmployeePayHistory] BY [public])
GO

ALTER DATABASE AUDIT SPECIFICATION [EmployeePayAuditSpec] WITH (STATE = ON);

So now anytime anyone selects from that table, there will be a record of it:

-- select from the table being audited
SELECT TOP (100) * FROM HumanResources.EmployeePayHistory;

-- query the audit file
SELECT
	DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_time) AS EventTime,
	session_server_principal_name,
	database_name,
	statement
FROM sys.fn_get_audit_file('C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\*',NULL,NULL);

So far, so good. Now let’s continue with our story and say that management wants to be able to run reports throughout the workday that show the state of the database at 6:00 am. You find the appropriate solution to be a database snapshot, so you setup a job to create a snapshot daily at 6:00 am, and another one to drop it late in the evening. All the reports can then query the snapshot and get the exact state of the database as it was at 6:00 that morning.

-- create snapshot
CREATE DATABASE AdventureWorks2012_Snap ON
(NAME = AdventureWorks2012_Data,
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\AW2012.ss')
AS SNAPSHOT OF AdventureWorks2012;

But what about that audit? Since a snapshot contains absolutely everything in the database at the moment it was created, it will have all that sensitive pay history data as well. How can you audit that?

The Free Lunch

The answer is that you don’t have to worry. A snapshot contains everything in the database at the moment it was created, and that doesn’t just apply to data – that goes for settings too, including audits. If someone was to query an audited table in the snapshot, this will be recorded by the audit just like on the base database. You’ll be able to tell the query was run on the snapshot by looking at the database name column:

-- query snapshot
USE [AdventureWorks2012_Snap]
GO

SELECT TOP (100) * FROM HumanResources.EmployeePayHistory;

-- query audit file
SELECT
	DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_time) AS EventTime,
	session_server_principal_name,
	database_name,
	statement
FROM sys.fn_get_audit_file('C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\*',NULL,NULL);

But what if you don’t want the snapshot to be audited?

The Free Lunch is at Grandma’s House

I can only speak from personal experience, but whenever I go visit my grandmother, I go knowing that I will be eating whether I’m hungry or not. She loves cooking and watching her grandchildren eat, and – let’s face it – there’s just no way to say no to grandma.

Similarly, there’s no way to say no to a snapshot. However the database was configured the moment the snapshot was created will be reflected in the snapshot as well. It’s important to remember that database snapshots are read-only, so even if you try to disable the audit on a snapshot, it won’t work:

USE [AdventureWorks2012_Snap]
GO

ALTER DATABASE AUDIT SPECIFICATION [EmployeePayAuditSpec] WITH (STATE = OFF);

The only way to create a snapshot without the audit in this case would be to stop the audit, create the snapshot, and then restart the audit. This raises other issues, namely that you probably shouldn’t be stopping an audit of sensitive data like that.

Nov 042013
 

I’m very proud to have contributed a chapter to Tribal SQL, which is now available for purchase at Amazon and finer bookstores everywhere. This is the first book I’ve had any part in writing, something I never thought I would do. I always hated English and literature classes in school – if it wasn’t math, hard science, or architectural history, I wanted no part of it. I credit the SQL community and blogging for slowly getting me excited about the written word.

Getting this book from idea to print was not a sprint but a marathon. While I’m sure there was plenty going on behind the scenes that I wasn’t aware of, here’s how everything came together from my point of view.

Tribal SQL CoverHow I got Involved

The first I heard about “SQL Tribal Knowledge” was from Matt Velic’s (@mvelicblog post almost two years ago. Despite being fans of MidnightDBAs (@MidnightDBAJen and Sean McCown, I missed their blog post announcing the project. Basically they wanted to get a whole bunch of unpublished people together, have everyone write a chapter, and then assemble them into a book. Seemed like a great idea to me, so I submitted a few topic ideas, and ended up writing about data compression – a feature I’ve done lots of work with and know rather well.

Writing

Actually writing the chapter was both the hardest and easiest parts for me. Getting started was difficult because I wasn’t sure about what style of writing to use. I ended up reading through about a dozen other books not for content, but to see what tone the authors took when putting their words to paper. I ended up going with something not much different from a blog post – perhaps slightly more formal. With a style established, everything flowed easily. After about 2 months, my initial draft was complete and to my liking.

Peer Editing

After the first drafts came peer editing. All the authors picked two other chapters to read and comment on. It was a great gut-check on things like grammar, effectiveness of explanations, etc. With both reviews in-hand it was back to the drawingkeyboard for revisions.

Volunteer Editing

After peer editing there was a second round of reviews, this time from outside the author pool. Over 20 bloggers and speakers from the SQL Server community volunteered their time to read and comment on our work. Afterwards, this was followed by another round of revisions.

Publishing

Throughout the process, Jen and Sean were trying to find a publisher. There was some brief talk about self-publishing, but then Red Gate got involved and agreed to publish the book, now named “Tribal SQL”. This involved – you guessed it – another round of editing, this time by Red Gate’s Tony Davis (@TonyTheEditor). Tony was delightful to work with and I have tremendous respect for what he does. After a few rounds of back-and-forth with him, my chapter was done.

From this point on I’m sure there were a lot of things taking place that I didn’t know about, but the end result is that the book was published, and Red Gate was kind enough to hold a launch event at the PASS Summit and distribute copies. They also held a book signing for all the authors that were in attendance.

Would I do it again?

I would. Writing a chapter is a lot of work – writing an entire book is obviously much more, but I found it enjoyable and rewarding. Given the right topic and/or group of co-authors, I’m sure I could have just as much fun working on another one.

Write Book, Get Published, ?????, Profit?

HA! I should mention that all royalties from the sale of this book are being donated to Computers4Africa. Even if that weren’t the case, any money we might get would work out to a paltry rate when divided by the number of hours spent working on this project. If you ever decide to write a book, make sure you’re doing it for the right reasons. Money is not one of them.

Thank You

Finally, some thanks are in order. Thanks to Jen and Sean McCown for putting this all together, and to Matt Velic for blogging and making me aware! Thanks to all the other co-authors and editors for their time and hard work. Finally, thanks to the wonderful people at Red Gate, not only for publishing this book, but for the tremendous support they give to the SQL Server community as a whole.

Aug 062013
 

When I’m presenting on SQL Server data compression, I frequently get asked if data compression and backup compression can be used together. I keep meaning to put together a blog post with a demo so I have something to direct people towards for further reading. So here we are – let’s give it the old college try and find out!

Compression Primer

Methods for compressing data fall into 2 categories: lossy and lossless. Lossy compression reduces the size of data by strategically removing parts of it that are unlikely to be missed. This data is lost forever, hence the term lossy. Prime examples of lossy compression can be found in digital images, videos, and music. It’s very easy to remove a little color depth from an image or reduce the bitrate of a sound or video file in a way that shrinks its size without significantly changing its appearance to an observer. Lossy compression is great in these cases because our eyes and ears are far from perfect, but for things like a document or a database it’s not a good choice. Nobody would be happy if data randomly started disappearing to save space (well, maybe the DBA would be, but only if the data that’s disappearing are those copies of tables with “_old” in the name that nobody has looked at in years…) For types of data where nothing can be lost, there’s lossless compression. Lossless compression can’t remove anything, so it generally does things like build dictionaries of repeating patterns and create pointers to the dictionary when appropriate. Centralizing duplicate data in a dictionary means multiple copies no longer need to be stored, so the compressed version will be smaller. Both SQL Server data compression and backup compression use lossless compression methods.

Data Compression and Backup Compression, circa 1966.

SQL Server data compression and backup compression are completely separate features; they do very different things in different ways. At a high level, data compression reduces the size of records in memory (and on disk as well), allowing more records to fit on a data page thereby decreasing the space needed to store the data. Backup compression compresses the database as it’s being written to a backup file, which means backups will require less storage space. They build off each other well because data compression can only do so much: it can de-duplicate data on a single page, but not across multiple pages. Even with data compression enabled in page mode, its highest setting, it’s very likely that similar data will exist on multiple pages. Backup compression can pick up where data compression leaves off and compress things a bit more by eliminating those duplicates. Like Batman and Superman, they combine their powers for good, not evil. If you have the ability to compress both data and backups, it’s definitely worth your time experimenting to see if your environment benefits from using them together.

Demo Time!

Let’s take a look at how they compliment each other with a demo. I used the AdventureWorks2012 database, which is available for download on CodePlex if you don’t already have it.

The AdventureWorks databases are great for most examples, but none of the included tables are very big, and to make things more apparent I wanted to show some larger tables. Fortunately several tools to increase the size of tables have been written over the past few years. I used the script written by Jonathan Kehayias. This script creates 2 new tables: Sales.SalesOrderDetailEnlarged and Sales.SalesOrderHeaderEnlarged. Running the entire script the first time and then the INSERT block 3 additional times will bring the total size of the AdventureWorks2012 database up to about 4.5 GB.

Once our tables have been enlarged, let’s rebuild their clustered indexes and set the fill factor to 100. This will ensure that fragmentation is minimized so we know we’re comparing apples to apples when looking at sizes later.

ALTER INDEX [PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID]
ON [Sales].[SalesOrderDetailEnlarged]
REBUILD WITH (SORT_IN_TEMPDB=ON,FILLFACTOR=100);

ALTER INDEX [PK_SalesOrderHeaderEnlarged_SalesOrderID]
ON [Sales].[SalesOrderHeaderEnlarged]
REBUILD WITH (SORT_IN_TEMPDB=ON,FILLFACTOR=100);

We’ll also want to shrink the transaction log. Enlarging tables and rebuilding indexes grew the log, and the restore portion of this demo will give a more accurate picture, especially in terms of time, if the log is as small as possible.

DBCC SHRINKFILE('AdventureWorks2012_Log',1);

Now for the data collection part. We’ll take a total of 4 backups of the AdventureWorks2012 database. For each backup we want to monitor:

– the size of the backup file
– the time the backup took
– average CPU utilization while the backup was running

The size of the backup file is easy – we can just look at the file’s properties in Windows after-the-fact to find it.

The elapsed time of the backup is also fairly easy. When you run a backup in SQL Server Management Studio, the “Messages” window will return some information about the backup, including the time it took to run:

Average CPU utilization is a little more work because SSMS doesn’t tell you about it. I use Perfmon since it’s built into Windows. When first opening Perfmon, the counter it shows by default is “% Processor Time”. This may seem like the counter you’ll want to use, but it isn’t. % Processor Time includes all operations on the processor, many of which are kernel processes we can’t control. There’s another counter called “% User Time” that will only show user processes. This still isn’t perfect; there will undoubtedly be other user processes running, but with all applications closed this counter is usually pretty close to zero.

The “Average” box will show the average CPU utilization since the counter was last cleared. For the most accurate value you will want to clear the counter immediately before starting a backup by right-clicking inside the chart and selecting “clear”.

Now to take some backups and monitor the results using the above methods. The first backup will be the default with backup compression disabled, we will enable it for the second.

BACKUP DATABASE [AdventureWorks2012]
TO DISK = '_path_\AW_NoDataCompression_NoBackupCompression.bak';

BACKUP DATABASE [AdventureWorks2012]
TO DISK = '_path_\AW_NoDataCompression_YesBackupCompression.bak'
WITH COMPRESSION;

Now let’s add data compression to the mix by enabling it on the clustered indexes of the 2 enlarged tables. We could enable it on all indexes, but these 2 represent the vast majority of the database’s size. I used page compression to get the maximum possible effect, and once again set the fill factor to 100.

ALTER INDEX PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID
ON Sales.SalesOrderDetailEnlarged
REBUILD WITH (DATA_COMPRESSION=PAGE,SORT_IN_TEMPDB=ON,FILLFACTOR=100);

ALTER INDEX PK_SalesOrderHeaderEnlarged_SalesOrderID
ON Sales.SalesOrderHeaderEnlarged
REBUILD WITH (DATA_COMPRESSION=PAGE,SORT_IN_TEMPDB=ON,FILLFACTOR=100);

The rebuild operations necessary for compression will again grow the log, so we will shrink it again for consistency, and then run 2 more backups – with and without backup compression.

DBCC SHRINKFILE('AdventureWorks2012_Log',1);

BACKUP DATABASE [AdventureWorks2012]
TO DISK = '_path_\AW_YesDataCompression_NoBackupCompression.bak';

BACKUP DATABASE [AdventureWorks2012]
TO DISK = '_path_\AW_YesDataCompression_YesBackupCompression.bak'
WITH compression;

At this point we have 4 backup files:

-No data or backup compression
-Backup compression but not data compression
-Data compression but not backup compression
-Both data and backup compression

I also restored each of these backups and captured the restore time and average CPU for each. I won’t cover that here, but feel free to try restoring on your own.

Now that we’ve completed our trials, let’s look at what we have. Since we’re talking about compression, the size of the backup files is what we’re most interested in. Here are the results I observed:

As you can see, the backup with no compression at all is the largest. Data compression yields a smaller backup file, backup compression gives us an even smaller file, and combining both data and backup compression is slightly smaller yet. This is great news, as it shows that backup and data compression can work together to produce a result that’s smaller than either can do on their own.

As for backup and restore times, I would expect the results to vary based on the hardware being used. I wrote my backups out to an external USB hard disk and restored from there as well.

The backup and restore times correlate pretty closely with the backup sizes above. This is no surprise – the vast majority of time for backup and restore operations is spent waiting on I/O either reading the database from disk or writing the backup to disk. The less data SQL Server has to read or write due to compression, the faster the operations will complete.

Here’s the average CPU usage for the duration of the backup and restore operations. This is a bit more interesting, but still makes sense. Backups and restores that don’t involve backup compression use very little CPU; these operations are mostly I/O. Processor usage shoots up when backup compression is enabled because there’s a lot more work involved with compressing and decompressing the backup. Based on the above results, compressing a backup stream is more CPU-intensive than decompressing it for a restore. It also looks like slightly less CPU is required for backup compression of databases utilizing data compression as opposed to those that don’t. I imagine this is because the page-level data compression we used makes a big effort to reduce duplicated values, so there’s slightly less work to be done when it comes time to compress the backup. Backup compression will cause elevated CPU utilization only when backups or restores are running, while data compression uses extra CPU power throughout the course of normal DML operations, but not at time of backup.

So what’s the catch?

Everything we just did shows us that there’s an advantage to using data compression and backup compression together. The differences weren’t very significant in this case, only about 100MB for overall backup size, but I tried this with many datasets and the advantage only became more apparent as the sizes increased. It sounds like a total win-win, but the saying goes that there’s no free lunch in computing, and that holds true for compression as well.

The first catch is cost. Data compression is only available in SQL Server Enterprise Edition, and it’s not cheap.

After that, the biggest thing to look out for is CPU utilization. Both data and backup compression involve extra work for your CPU, and if your CPU utilization is already high, you may find that data and/or backup compression perform poorly.

Final Thoughts

Both backup and data compression can provide tremendous advantages in terms of performance and reduced storage space, but these advantages are highly dependent on the data they are manipulating. Be sure to take some time and test whether your environment comes out ahead by using them either individually or together. Based on what I’ve seen, it probably will, but it’s always better to make sure beforehand rather than turning something on and seeing no benefit or, even worse, a performance hit.