I enjoy watching college basketball but am absolutely horrible at making my NCAA tournament bracket each year. This time around, rather than spending an hour making guesses, I decided to let SQL Server do the work for me. Besides it’s way cooler to be able to justify your bracket with T-SQL, right?
My bracket is based on an expression using 3 data points:
– Name of the school
– Number of overall games won
– Number of overall games lost
Since most college/university names contain the words “college” or “university”, I consider those to be noise words, along with “of”, “at”, and “and”. Noise words were removed from the full (not abbreviated) name of each school. Spaces were also removed. For example, “UCLA” expands to “University of California, Los Angeles” and then becomes “CaliforniaLosAngeles” for my purposes.
Now that I had my list of school names, I decided to utilize the SOUNDEX() function to make things a little more interesting. Applying the soundex algorithm to a word yields a 4-character code that represents the way the word sounds. The first character of the soundex code is always the first letter of the word, and the other three characters are numeric and represent the sound of the remaining letters. Many U.S. states, including Illinois, use a soundex code as part of their driver’s license numbers.
Soundex codes are neat, but the letter they start with wasn’t of any use to me so I dropped it and just focused on the numeric portion. This numeric portion of the soundex was multiplied by the school’s overall record for the year (games won / games lost) to yield a score for each school. Higher scores win. Here’s the resulting bracket (click to enlarge).
Of course this wouldn’t be very much fun if you couldn’t play with it yourself. Feel free to download my raw data here, and when imported into a table called “Brackets” the following query will give you all that I just described.
-- Create the table
CREATE TABLE dbo.Brackets (
Team VARCHAR(30) NOT NULL,
Wins TINYINT NOT NULL,
Losses TINYINT NOT NULL
-- (Import data here)
-- Calculate score value
(CAST(RIGHT(SOUNDEX(Team), 3) AS SMALLINT) * (CAST(Wins AS NUMERIC)/Losses)) AS score
ORDER BY Team;
Enjoy, and here’s to hoping your bracket is way better than mine! (You won’t have to try very hard to dream up a better one!)
Factoid: It turns out that the “A&M” part of “Texas A&M” actually has no meaning. According to Wikipedia it was founded as the “Agricultural & Mechanical College of Texas” in 1876, but the name was changed to “Texas A&M” in 1963, with the “A&M” part being purely symbolic.
– US Driver’s License Numbers (For FL, IL, and WI at least.) I always thought this was pretty nifty!