Mar 142012
 

March Madness is once again upon us, which means it’s time for one of the biggest time sinks of the year: filling out a bracket for the pseudo-sanctioned office NCAA pool. Along with the superbowl, this is a hallowed time when bosses turn a blind eye to all the cash that seems to be heading towards that person who has been designated as both trustworthy enough to manage the pool and cool enough to forget that gambling is against company policy.

Super Bowl pools are usually pretty easy – pick a square (or two or three), hand over the cash, and the numbers get filled in later. Building an NCAA bracket on the other hand is a lot of work. Even when picking winners completely at random there are over 60 choices to make. For someone like myself who doesn’t watch nearly enough basketball to make educated guesses about most of the teams it’s a lot of wasted effort. I’d rather spend my time doing something more constructive, such as developing a formula in T-SQL to make my bracket picks for me. Much like last year I’ve done it again.

This year’s bracket is based on 2 things. The first is the idea that a #16 seed team will never beat a #1 seed team. (Since it’s never happened yet it seems like a safe bet). After that, game winners are determined by a formula.

All teams in the tournament are searched in Bing twice according to the following template:
<School Name> <Team Name> Basketball rocks
<School Name> <Team Name> Basketball sucks

For example:
“Illinois Fighting Illini Basketball rocks”
and
“Michigan Wolverines Basketball sucks”
would both be valid queries.

The total number of results returned by Bing for each query are recorded in a table, and to represent a game the following values will be compared for teams A and B:
Team A: A.rocks / B.sucks
Team B: B.rocks / A.sucks

The team with the higher value wins the game.

One flaw in last year’s formula was that it generated a value for each school which never changed. This meant the school with the highest generated value would be the one to win the championship. This time around the outcome differs based on the teams that are compared, which makes things much more interesting. Here’s the bracket that resulted (click to enlarge).
Bracket

If you’d like to try this yourself you can download the code here. Since search results are constantly changing, you’ll likely have a different outcome than I did. Also be forewarned that while this code does work, it was put together very quickly and is neither elegant nor efficient.

Why Bing?
In case you’re curious, I decided to use Bing search instead of Google because I’m a tightwad and it turns out Google’s Search API isn’t free. If you want to make more than 100 queries per day on Google, you need to pay. Since I needed to make over 100 queries to gather my data plus a few extra during development I went with Bing, which has a much friendlier policy of restricting your query rate instead of limiting the total number of queries per day.

Enjoy the tournament! Should you use my method and come up with a winning bracket, please contact me to discuss my share of the winnings.

  7 Responses to “T-SQL and B-Ball: My 2012 NCAA Bracket”

  1. Wow, Iowa State in the Final Four? I’m a ISU fan and I don’t even have them going that far.

    Another good post.

  2. Wow, #12 seed winning the whole thing.

    • Yeah, but I can blame the algorithm :)

      Apparently the farthest a 12 seed has ever made it so far is the elite eight (Missouri in 2002).

  3. Is this how the decide who plays in the national championship for college football?

  4. “Illinois Fighting Illini Basketball rocks”
    and
    “Michigan Wolverines Basketball sucks”
    would both be valid queries.

    Spoken like a true Illinois Alumnus! Now only if we actually made it to the dance this year…

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)