Lindy's Five Essential Websites (Non-Major Media) for 2013
[+] Team Summaries

Tuesday, July 8, 2008

An Interesting Null

My interest in this blog is, one, to rate and rank teams, but, ultimately, to be able accurately quantify college football teams so that I can more accurately forecast game outcomes. While I'm revving up for another season, I thought it might be interesting to take a closer look at the industry standard in college football forecasting-the Vegas line. And since I'm writing about it now, you can guess I found something that at least I consider interesting.

I'll start with a quick note of the Vegas line. The line is not created to forecast results--its sole existential purpose is to split bets 50/50 above and below. If too many bets are made above or below the line then the line is adjusted. Therefore, the line is a product of the interaction of two forecasting methods. The first method uses a single model-part statistical, part qualitative-that attempts to predict the public attitude. The second method employs market forces, allowing the public to aggregate information and, thus, move the line up or down according to public sentiment. The public responds to the line and the line responds to the public. The Efficient Market Hypothesis tells us that if the Vegas casinos provide an open market, all available information should be aggregated in adjusting the line and it should be impossible to consistently outperform the line without special insider information (which can be purchased from your neighborhood crooked NBA ref). If someone can find a model that can consistently outperform the Vegas line (after it has been adjusted to bettor response) they can establish that the line does not satisfy the EMH-and they can make themselves millionaires. I will not, here, provide any evidence that the line does not satisfy the EMH.

Now, to the numbers. In 2007, the Vegas line and the actual game outcome (both in terms of point differentials) had a correlation of r=.4368. This is relatively high; as I mentioned before, this is the industry standard, but it is not overwhelming. For a little interpretation, if we were to guess the point differential using the line, we would, on average, be about 18% closer than if we just guessed that every game would end in a tie. And that's the industry standard.

The line is 12.25 points off from the actual point differential on average. But as you can see in the graph, the distribution is skewed--the average is pulled up by a few cases where the Vegas gamblers really missed the boat.

My first theory was the the Vegas line would have a tougher job accurately predicting the point differential in higher scoring games or games with a larger expected point differential. But with a correlation of .0635 of the total (total) combined scoring and the absolute difference between the line and actual outcome (difference). There was a slight increase in difference as the total score increases, but when we consider that the total has to be large in many cases for the difference to be large, we have to rule this out as a viable theory. So, is the line less accurate when one team is definitely better than the other (which leads to quirky 4th quarters with backups and such)? The answer is, again, a resounding no. In fact, if anything, the trend runs in the opposite direction.

Does the line give preference to favorites or underdogs? If you were to put one dollar on the underdog in the 688 games in 2007, you would have gone home a winner 346 times (50.3%), raking up a $4 profit. More impressive than a split that is almost exactly 50/50 is the fact that the mean and standard deviation of outcomes on both sides are almost exactly the same--in other words, the line is right in the middle of its own error distribution.

These null results were to be expected and they fit nicely with the efficient market hypothesis--the actually outcomes are normally distributed around the line. But one other null result was not expected. The line does not become a more accurate predictor of outcomes as the season progresses. One would think, as the season progresses, we get a larger data set that we can use to make more accurate predictions, but instead the predictions don't get more accurate. My only explanation is that injuries through the season cause enough fluctuations to offset the increased sample size-but I still find it surprising that the average error doesn't have more of a downward trend as the season progresses.

No comments:

Post a Comment