Jan 262012
 

Knowing when and how often your indexes are used can really come in handy. Indexes that are never utilized aren’t worth keeping around, and knowing their usage patterns can be a big help when making decisions regarding things such as filegroup placement and compression settings. SQL Server 2005 brought some great advancements in the form of the sys.dm_db_index_usage_stats DMV, which returns statistics on index usage that are automatically being kept by the server. While it’s a step forward, I feel it still leaves a few things to be desired:

  • The values are cumulative since the server was last restarted
  • Once the server is shut down or restarted, there’s no way to get the previous values back

CollectableWith this in mind, I came up with my own system for archiving non-cumulative index usage stats on a daily basis. Microsoft introduced Management Data Warehouse in SQL Server 2008 which provides a way to keep track of these statistics over time (along with many others), but I’ve been using my own method longer than that. This will also work in SQL Server Express Edition, which doesn’t support MDW. I’ll give descriptions of each part below, but if you just want the code right now download it here.

Tables

IndexUsageStats_LastCumulative
As I mentioned before, the output from sys.dm_db_index_usage_stats is cumulative since the last time the server was restarted, and I was looking for a way to find daily values that weren’t cumulative. Restarting the server immediately after collecting the values would have accomplished this, but users probably wouldn’t be very happy :). Instead I just keep track of the previous cumulative values and subtract them from the current ones. If the output shows an index was used 7 times on Tuesday at Midnight, and the same number was 12 on Wednesday at Midnight, then it must have been used 5 times in that 24 hour period. IndexUsageStats_LastCumulative holds the previous cumulative values for all indexes to make this calculation possible.

Names
The only true way to identify an object in SQL Server is by its name, but DMVs typically return object IDs instead of names. It’s easy to translate between names and IDs, but over time names and IDs can change:
– If an object is dropped and re-created with the same name, chances of it having the same ID are virtually zero.
– If an object is renamed, its ID will not change, but of course its name will.
Using either IDs or Names exclusively can make historical values become useless over time – you might have an ID that’s no longer valid or a name that changed a while back. To give myself more options for tracking changes over time, I store both. Since storing each object’s name in a character data type would consume a lot of space and be repetitive, I prefer to map object names to numeric values, and this table facilitates that.

IndexUsageStats
This table does the heavy lifting storing for our system – it contains all the historical index usage statistics by both object ID and name. I’ve found it to be a great candidate for compression if you’re running enterprise edition, but it’s still pretty compact even when uncompressed thanks to the Names table.

Stored Procedures

CollectIndexUsageStats
The stored procedure that drives statistics collection. You’ll want to set up a job to run this – mine runs at midnight. Basically it iterates through each online database in the instance collecting usage stats and then compares them to the previous cumulative values and writes the differences to the IndexUsageStats table. You’ll find comments in-line.

Views

vw_IndexUsageStats
Since the data contained in the IndexUsageStats table is all numeric, it’s not the easiest to browse. This view makes some joins back to the Names table so things are a little more user-friendly. Most of the queries I write are against this view. Object IDs aren’t returned here as they’re typically not needed. When they are necessary, I’ll write more detailed queries against IndexUsageStats.

A few notes before the code:

  • As always, this code is offered as-is with no warranty whatsoever. Do not deploy this in a production environment before you have tested it and understand exactly what it does.
  • I recommend putting all these objects in their own database. If you already have a DBA database for management information that should work fine too. My database is called “iDBA” because I inherited it with that name.
  • This was written and tested on SQL Server 2008 and 2008R2. It can probably be tweaked to run in 2005 by removing the datetime2 datatype and writing substitute queries for the MERGE statement.

Sep 142010
 

This post is my contribution to T-SQL Tuesday #10, which is being hosted by Michael J. Swart (blog | twitter)

T-SQL TuesdayEach year my Boy Scout troop has a rummage sale as a fundraiser.  All the scouts and their families gather up their old clothes, books, toys, electronics, and any other treasures they no longer need and donate them to the troop, which sells them.  It’s basically a giant garage sale, except it’s done in the gym of the school we meet at.  Since everything sold comes from someone associated with the troop, many years ago it was decided to create an award for the person that donates the weirdest piece of crap most unique item each year.  This award comes in the form of a travelling trophy which started out as a child’s potty chair that was painted gold.  Each year, the winner is instructed to embellish the trophy with something new.  Here’s how it looks now!

As you can see, it’s grown from a simple potty chair mounted on a board into a lamp with a whole bunch of accessories, including a barbecue grill. I’m surprised nobody’s added a TV or fridge yet – then there’d really be no reason to leave!

So what does this have to do with indexes?  The bowels of this post contain something that I have embellished myself.  Last month Tom LaRock (blog | twitter) posted a script for helping to detect duplicate indexes.  He couldn’t claim credit for it as he said it was passed on to him a while ago and he’s unsure of the original source.  Similarly, I won’t claim credit for this other than to say I made a few changes I thought would be handy.  If you feel this script is lacking functionality that you think would be helpful, I invite you to change it and please share with the community when you’re done!

Why Duplicate Indexes Are Bad

Aside from serving no purpose, duplicate indexes can cause problems to occur which could require your attention, such as:

  • Wasted disk space – just like any other type of data, storing something more than once means it will take up space more than once as well.
  • Slower backups, restores, and integrity checks – more data means more time to read/write/check it all
  • Slower data modification operations – inserting, updating, and deleting rows will take longer when there’s more indexes that need updating
  • Increased chances of deadlock and blocking – More objects requiring locks for an operation means there’s a greater chance that another process has already acquired at least some of said locks
  • Increased transaction log growth and/or usage – The more writing/locking that’s happening due to an increased number of objects means that more will be written to the log

And I’m sure there’s plenty more potential issues than those I just listed. The simplest way to avoid this is to maintain only as many indexes as necessary. This script can help you find if any duplicates exist.

What’s New In This Version

  • I removed the columns that returned individual component columns of the indexes.  Since there was already a column that showed them all concatenated together, I couldn’t think of any reason for listing them individually other than it might help when programmatically dropping indexes, which shouldn’t be done anyway.  Indexes should only be dropped after careful review.
  • Added a column for displaying the size of the index.  Duplicate indexes are bad no matter what their size is, however this information may be helpful.
  • Added columns showing whether the index is filtered or has included columns.

Disclaimer

I have tested this script on SQL Server 2005 and later, and it will not run on earlier versions.  You should not trust any script you find on the internet (including this one!)  Make sure you completely understand what a script is doing before running it.  Always test on a testing or development server before using it in any production environment.

The Script

What you’ve really been waiting for.  You can see the entire script below, or just download the .zip of it here.  Enjoy!

Jun 072010
 

T-SQL Tuesday ImageThis post is my contribution to T-SQL Tuesday #007, hosted this month by the SQLChicken himself, Jorge Segarra (Blog | Twitter).

Since I’m a bad DBA and have yet to get my hands on a copy of R2, I’ll make use of the rule that we can discuss any feature from 2008.  My favorite is filtered indexes, as they were the solution to an issue I had been trying to fix for 5 years.

The Background

In a former life, I spent several years working as a building inspector for one of Chicago’s western suburbs.  It quickly became evident that they were in need of a system for keeping track of permits, inspections, and other day-to-day activities, as their gigantic spreadsheet clearly wasn’t cutting it.  Wanting to learn about databases and see if I could solve said problem, I picked up a book on Access 2000 and my solution was born.  It’s grown up a lot in the past 7 years, and is now used by 2 other municipalities as well.

The Problem

In my system, building permits can have one or two unique identifiers depending on their stage in the permit lifecycle.  That probably doesn’t make sense, so I’ll elaborate.  When a resident applies for a building permit, a record is created in the permit table with a primary key known as the PermitId.  This permit has a status of pending, as it has only been applied for and not issued.  Many permits never make it to the issued state, as they may be denied or their application withdrawn.  Permits that make it through the approval process and are issued receive a Permit Number, another unique identifier.  Enforcing uniqueness on the PermitNumber column was desired, but impossible as all pending permits had a NULL value for PermitNumber.  The best I could do for this (given my knowledge at the time) was to create a non-unique index on PermitNumber and hope that duplicate values didn’t end up occurring.  While that may be good enough for some, it always irked me.

Some may ask why PermitId and PermitNumber can’t just be combined.  The answer is that it’s a business requirement.  All three of the cities I do business with issue PermitNumbers in sequential order based on the permit issue date, so there’s no way the PermitNumber can be determined ahead of time.  Also the PermitId is an internal value and is only known to the database for use in joins.  PermitNumber, on the other hand, is known to the public and is commonly searched on, so indexing it is desirable.

The Solution

The Filtered Index feature allowed me to create a unique index on PermitNumber where the value is not null.  I was able to enforce uniqueness with permit numbers, and all was right in the world once again.  The syntax to do this was extremely simple:

And there you have it, a simple solution using elegant syntax.  ‘Tis a shame that this feature didn’t appear until 2008, but I’m sure glad it’s there now!