T-SQL and B-Ball: My 2012 NCAA Bracket
March Madness is once again upon us, which means it's time for one of the biggest time sinks of the year: filling out a bracket for the pseudo-sanctioned office NCAA pool. Along with the superbowl, this is a hallowed time when bosses turn a blind eye to all the cash that seems to be heading towards that person who has been designated as both trustworthy enough to manage the pool and cool enough to forget that gambling is against company policy.
Super Bowl pools are usually pretty easy – pick a square (or two or three), hand over the cash, and the numbers get filled in later. Building an NCAA bracket on the other hand is a lot of work. Even when picking winners completely at random there are over 60 choices to make. For someone like myself who doesn't watch nearly enough basketball to make educated guesses about most of the teams it's a lot of wasted effort. I'd rather spend my time doing something more constructive, such as developing a formula in T-SQL to make my bracket picks for me. Much like last year I've done it again.
This year's bracket is based on 2 things. The first is the idea that a #16 seed team will never beat a #1 seed team. (Since it's never happened yet it seems like a safe bet). After that, game winners are determined by a formula.
All teams in the tournament are searched in Bing twice according to the following template:
1<School Name> <Team Name> Basketball rocks
2<School Name> <Team Name> Basketball sucks
For example:
"Illinois Fighting Illini Basketball rocks"
and
"Michigan Wolverines Basketball sucks"
would both be valid queries.
The total number of results returned by Bing for each query are recorded in a table, and to represent a game the following values will be compared for teams A and B:
1Team A: A.rocks / B.sucks
2Team B: B.rocks / A.sucks
The team with the higher value wins the game.
One flaw in last year's formula was that it generated a value for each school which never changed. This meant the school with the highest generated value would be the one to win the championship. This time around the outcome differs based on the teams that are compared, which makes things much more interesting. Here's the bracket that resulted.
If you'd like to try this yourself you can download the code here. Since search results are constantly changing, you'll likely have a different outcome than I did. Also be forewarned that while this code does work, it was put together very quickly and is neither elegant nor efficient.
Why Bing?
In case you're curious, I decided to use Bing search instead of Google because I'm a tightwad and it turns out Google's Search API isn't free. If you want to make more than 100 queries per day on Google, you need to pay. Since I needed to make over 100 queries to gather my data plus a few extra during development I went with Bing, which has a much friendlier policy of restricting your query rate instead of limiting the total number of queries per day.
Enjoy the tournament! Should you use my method and come up with a winning bracket, please contact me to discuss my share of the winnings.