Mar 152011

I enjoy watching college basketball but am absolutely horrible at making my NCAA tournament bracket each year. This time around, rather than spending an hour making guesses, I decided to let SQL Server do the work for me. Besides it’s way cooler to be able to justify your bracket with T-SQL, right?

My bracket is based on an expression using 3 data points:
– Name of the school
– Number of overall games won
– Number of overall games lost

Since most college/university names contain the words “college” or “university”, I consider those to be noise words, along with “of”, “at”, and “and”. Noise words were removed from the full (not abbreviated) name of each school. Spaces were also removed. For example, “UCLA” expands to “University of California, Los Angeles” and then becomes “CaliforniaLosAngeles” for my purposes.

Now that I had my list of school names, I decided to utilize the SOUNDEX() function to make things a little more interesting. Applying the soundex algorithm to a word yields a 4-character code that represents the way the word sounds. The first character of the soundex code is always the first letter of the word, and the other three characters are numeric and represent the sound of the remaining letters. Many U.S. states, including Illinois, use a soundex code as part of their driver’s license numbers.

Soundex codes are neat, but the letter they start with wasn’t of any use to me so I dropped it and just focused on the numeric portion. This numeric portion of the soundex was multiplied by the school’s overall record for the year (games won / games lost) to yield a score for each school. Higher scores win. Here’s the resulting bracket (click to enlarge).


Of course this wouldn’t be very much fun if you couldn’t play with it yourself. Feel free to download my raw data here, and when imported into a table called “Brackets” the following query will give you all that I just described.

Enjoy, and here’s to hoping your bracket is way better than mine! (You won’t have to try very hard to dream up a better one!)

Factoid: It turns out that the “A&M” part of “Texas A&M” actually has no meaning. According to Wikipedia it was founded as the “Agricultural & Mechanical College of Texas” in 1876, but the name was changed to “Texas A&M” in 1963, with the “A&M” part being purely symbolic.

Further reading
US Driver’s License Numbers (For FL, IL, and WI at least.) I always thought this was pretty nifty!

  7 Responses to “T-SQL and the Big D520”

  1. Reminds me of how dear old mom used to pick horses at the track. Still paying off the college loans.

    • I should try filling out another bracket completely randomly and seeing how it compares to this one. I’ll guess the random one ends up better. Thanks for reading!

  2. San Diego for the win, huh? I love how you modified SOUNDEX()!

    • And it has my beloved Fighting Illini losing in the second round! Oh well, I’ll have to stick by my pseudo-random choices…

      I didn’t really modify SOUNDEX() itself – just transformed the output a little bit. Glad you enjoyed it!

  3. Hey, if IL makes it to the 2nd round I’d be happy, and a bit surprised considering our history of 1-and-out appearances. When I first saw the title “big d520” I first thought “now that would be one huge die to roll” (left over from a brief stint as an RPG-er.)

    • Agreed, having them make it to the 2nd round is probably wishful thinking. I never would have thought of the “other” d520 – didn’t ever get into playing any games like that, though I watched many a D&D game on winter campouts.

  4. […] more constructive, such as developing a formula in T-SQL to make my bracket picks for me. Much like last year I’ve done it […]

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">