Mar 292011
 

Another SQL Saturday has come and gone, but for me this one was the best by far. I’ve now had the pleasure of attending 3 SQL Saturday events but this was the first one I’ve had a hand in planning. It was a great experience and I can’t wait until next year!

Friday

On Friday I spent the day at Brent Ozar’s (blog | twitter) super-awesome FreeCon (I’ll post about that soon!) From there I headed to the event site, DeVry University in Addison (thank you for letting us use your building!), where I joined a group already hard at work doing things like stuffing goody bags, moving tables and sorting registration info. From there we headed to the speaker dinner at Home Run Inn, which was sponsored by the great folks at SQL Sentry. There I got to meet some wonderful people I had previously known only from twitter including Allen White (blog | twitter), Hope Foley (blog | twitter), Jose Chinchilla (blog | twitter) and Nicholas Cain (blog | twitter).

Saturday

Saturday started bright and early with setting up breakfast and hauling in water and pop for 400. We were fortunate to have many great volunteers willing to help, which made the day go very smoothly. There’s too many of them to list here, but if you helped out with anything, please know that we really do appreciate it!

Helping to plan and run the event is a wonderful thing, and I can’t wait to do it again. That being said, probably the biggest downside to being on staff is that you really can’t attend all of the sessions. I only made it to one of them: Allen White’s (blog | twitter) talk on automating Policy-Based Management using PowerShell. The rest of my day was spent helping out, getting stuff ready, setting up and tearing down lunch, and preparing for my presentation.

Lunch was very well-received and I heard many complements about the food. A big thanks to Chef Phillip Foss of The Meatyballs Mobile for working with us and delivering our sandwiches on time. We were able to get everyone fed in about 15 to 20 minutes, which I thought was excellent. With pizza it takes time to cut and scoop slices onto your plate, and often people will stop to take from more than one pie. Having pre-wrapped sandwiches seemed much more efficient because everyone could just grab one and go.

PresentationThe presentation I gave on Backup and Restore internals seemed to go very well. I had an acceptable audience size for the last timeslot of the day and got many excellent questions. Hopefully my review cards will agree with this once I get them back. The only real snag I had was that the projector wasn’t displaying my VM window quite right, so the entire SSMS window wasn’t visible at the same time. I’ll have to figure out how to remedy this problem for the future.

No SQL Saturday would be complete without an afterparty. This one was a great time with much karaoke and merriment had by all. Unfortunately what happens at the afterparty stays at the afterparty, so if you’d like to see how much fun they really are, you’ll have to join us next time!

In Closing

If you still haven’t made it to an SQL Saturday event, I emplore you to look over the schedule for one in your area. Even if you arent a DBA per se, there’s still a lot of information there that can benefit developers, sysadmins, and other IT folk alike. They also offer incredible opportunities for networking which can help anyone. Thank you so much to our generous sponsors for making this event possible as well as to the members of the community who made it a success. Hope to see you all next year!

Mar 222011
 

After months of planning, SQL Saturday #67 is finally upon us! While this is the 3rd SQL Saturday I’ll be attending, it’s the first one I’ve had a hand in planning. It’s been a real pleasure to be part of such an awesome team putting it all together along with:
– Wendy Pastrick (blog | twitter)
– Norman Kelm (blog | twitter)
– Ted Krueger (blog | twitter)
– Bill Lescher (twitter)
– Rich Rousseau (twitter)
– Aaron Lowe (blog | twitter)
– Jes Borland (blog | twitter)

Schedule

We have an amazing schedule put together with over 40 great sessions! I personally am having trouble deciding which sessions to attend should I have the time, and I doubt I’m the only one with that predicament. I’m honored to say that one of my topics was also selected, so I’ll be talking about Backups and Recovery In-Depth at 4pm.

Networking

The attendee list also has an amazing number of bloggers/tweeters/authors of note, some of which I haven’t yet had the pleasure of meeting in person. SQL Saturdays are always wonderful opportunities for networking. (I’m also hoping I can get a few more of my books signed!)

Food

MeatyballsPlanning the menu for lunch was my primary responsibility, and while lunch was not free this year I am excited that we can provide something a little more interesting than pizza. Lunch is being catered by Chef Philip Foss of The Meatyballs Mobile. After moving on from Lockwood at The Palmer House, Chef Foss took his food to the road in the form of most excellent gourmet meatball sandwiches. He’ll have several varieties to choose from (including vegetarian for those who requested them).

For those who work in downtown Chicago, you can keep track of The Meatyballs Mobile on twitter: @FossFoodTrucks – locations are tweeted throughout the day. If you’d like to learn a little more, check out this video on ChicagoNow (also below).

It’ll be a great event! Hope to see you there Saturday!

Mar 152011
 

I enjoy watching college basketball but am absolutely horrible at making my NCAA tournament bracket each year. This time around, rather than spending an hour making guesses, I decided to let SQL Server do the work for me. Besides it’s way cooler to be able to justify your bracket with T-SQL, right?

My bracket is based on an expression using 3 data points:
– Name of the school
– Number of overall games won
– Number of overall games lost

Since most college/university names contain the words “college” or “university”, I consider those to be noise words, along with “of”, “at”, and “and”. Noise words were removed from the full (not abbreviated) name of each school. Spaces were also removed. For example, “UCLA” expands to “University of California, Los Angeles” and then becomes “CaliforniaLosAngeles” for my purposes.

Now that I had my list of school names, I decided to utilize the SOUNDEX() function to make things a little more interesting. Applying the soundex algorithm to a word yields a 4-character code that represents the way the word sounds. The first character of the soundex code is always the first letter of the word, and the other three characters are numeric and represent the sound of the remaining letters. Many U.S. states, including Illinois, use a soundex code as part of their driver’s license numbers.

Soundex codes are neat, but the letter they start with wasn’t of any use to me so I dropped it and just focused on the numeric portion. This numeric portion of the soundex was multiplied by the school’s overall record for the year (games won / games lost) to yield a score for each school. Higher scores win. Here’s the resulting bracket (click to enlarge).

Bracket

Of course this wouldn’t be very much fun if you couldn’t play with it yourself. Feel free to download my raw data here, and when imported into a table called “Brackets” the following query will give you all that I just described.

-- Create the table
CREATE TABLE dbo.Brackets (
	Team VARCHAR(30) NOT NULL,
	Wins TINYINT NOT NULL,
	Losses TINYINT NOT NULL
);

-- (Import data here)

-- Calculate score value
SELECT Team,
	(CAST(RIGHT(SOUNDEX(Team), 3) AS SMALLINT) * (CAST(Wins AS NUMERIC)/Losses)) AS score
FROM Brackets
ORDER BY Team;

Enjoy, and here’s to hoping your bracket is way better than mine! (You won’t have to try very hard to dream up a better one!)

Factoid: It turns out that the “A&M” part of “Texas A&M” actually has no meaning. According to Wikipedia it was founded as the “Agricultural & Mechanical College of Texas” in 1876, but the name was changed to “Texas A&M” in 1963, with the “A&M” part being purely symbolic.

Further reading
US Driver’s License Numbers (For FL, IL, and WI at least.) I always thought this was pretty nifty!

Mar 082011
 

T-SQL TuesdayT-SQL Tuesday this month is being hosted by the one and only Jes Borland (blog | twitter), and the topic of the month is aggregation. Powerful and fast aggregation tools are one of the biggest reasons to use a database. Without them, there would be a lot less awesomeness to look forward to – kind of like a meal without bacon.

Even almighty bacon has its downsides (fat, sodium, etc), and as awesome as aggregates are, they have weaknesses too. What can become an issue with aggregates are that by nature, all applicable values must be scanned to generate the aggregate. For instance, you can’t figure out the average of a set of numbers without scanning the entire set. The same holds true for the sum, min, max, and other statistics. This means that the time it takes to calculate an average can grow with the size of the data set. As sets get increasingly large, performance can degrade to the point where it’s no longer acceptable, in which case other tools may be necessary to deliver the requested aggregate information in an appropriate amount of time.

One such tool for doing this a data warehouse, which can be implemented using SQL Server Analysis Services (SSAS). SSAS has its advantages and disadvantages like any product, and probably the biggest thing it has working against it is its learning curve. There’s also the issue that it’s only available on standard edition and above, and you don’t get all the real goodies unless you’re using Enterprise Edition.

A few times I’ve been able to get the desired results with a much lower-tech solution than SSAS which I call the “Poor Man’s Data Warehouse”. It’s a table of pre-aggregated data. (I really wasn’t kidding when I said it was low-tech!) Basically another table is created to store the data in its already-aggregated form. Rather than querying the “primary” tables and performing the costly aggregations on them, these pre-aggregated tables can be used instead, typically with much faster results because fewer records need to be scanned to arrive at the result.

For some sample data I’ll be using the Production.TransactionHistory table from the AdventureWorks2008 database. If you don’t already have AdventureWorks2008 loaded somewhere, you can download it from here. Querying the entire table shows there’s a little over 113,000 rows. Let’s say that we’re only interested in the total number of each product sold each month. First let’s create a table for storing the aggregates:

CREATE TABLE Production.TransactionHistory_PMDW (
	MonthStart DATE,
	ProductID INT,
	TotalQty INT,
	CONSTRAINT PK_TransactionHistory_PMDW PRIMARY KEY CLUSTERED (MonthStart, ProductID)
);

You’ll notice this table only contains the bare minimum for storing this aggregate data, the start of the month (stored as a DATE), the Product ID, and the total quantity of that product sold during that month. Aggregating by another field in addition to Month and Product would involve adding another column.

Now that the table has been created it needs to be filled with aggregate data, which can be accomplished with the following:

INSERT INTO Production.TransactionHistory_PMDW (MonthStart, ProductID, TotalQty)
SELECT DATEADD(dd, -(DAY(TransactionDate)-1), TransactionDate) AS MonthStart, ProductID, SUM(Quantity) AS n
FROM Production.TransactionHistory
GROUP BY DATEADD(dd, -(DAY(TransactionDate)-1), TransactionDate), ProductID;

The above code will aggregate the entire contents of the Production.TransactionHistory table and load it into Production.TransactionHistory_PMDW. Since this is a monthly aggregation, the above statement would only need to be run once per month and with a few modifications it could add to the TransactionHistory_PMDW table only the data from the previous month. From here there’s all kinds of different things you can do, such as manipulating the date grouping to get aggregations of a different granularity.

The advantages of pre-aggregating become apparent rather quickly once you start retrieving data. The following batch will query both the original and aggregate tables for total sales of all products during the month of May, 2004:

-- original table
SELECT DATEADD(dd, -(DAY(TransactionDate)-1), TransactionDate) AS MonthStart, ProductID, SUM(Quantity) AS n
FROM Production.TransactionHistory
WHERE TransactionDate >= '20040501' and TransactionDate < '20040601'
GROUP BY DATEADD(dd, -(DAY(TransactionDate)-1), TransactionDate), ProductID
ORDER BY MonthStart, ProductID;

-- aggregate table
SELECT MonthStart, ProductID, TotalQty
FROM Production.TransactionHistory_PMDW
WHERE MonthStart = '20040501'
ORDER BY MonthStart, ProductID;

Viewing the execution plans in Management Studio shows that 100% of the query cost goes towards grouping and filtering the original table, whereas merely scanning the aggregate table is extremely trivial by comparison. Adding the suggested index doesn’t level the playing field either – the aggregate is still significantly faster.
Query Plan

So is this incredibly low-tech approach appropriate? The answer, of course, is “it depends.” Having used this method a few times, it can definitely come in handy.

Pros

– If the data has already been aggregated you can usually find your answer by scanning fewer rows, which means your results will most likely be faster and with less of a CPU cost.
– Querying a table that’s not “live” (being updated by production processes) translates to fewer locks (hopefully none at all!) on those live tables.
– Since the aggregate table is only being loaded at specific times and is separate from any production processes, you can heavily index this table with little worry of INSERT or UPDATE operations becoming slower as a result.

Cons

– Since the aggregate table is separate from the “original” (live) table, updates to the original table after the data has been loaded into the aggregate table won’t automatically occur. The aggregate table would have to be re-loaded for any changes to be reflected.

Tips

– I’ve found this works best when you’re aggregating by date, since data from the past is (hopefully) unlikely to need updating after-the-fact!