Lindy's Five Essential Websites (Non-Major Media) for 2013
[+] Team Summaries

Wednesday, October 24, 2012

Diagnosing statistical indigestion: the incongruency index

The goal of CFBTN is not to have the fanciest looking/sounding model for predicting outcomes, but the most robust. This means that the model draws on more than a hundred variables, evaluates teams in dozens of different ways, and then weights and combines those different evaluations using past results to make predictions. That makes it hard for me to explain exactly how the model works - it is, in fact, many different models that I slam into each other in hopes of making the Captain Planet of models. Not exactly an elegant process. It also means that some pieces might not agree with the rest, giving the model statistical indigestion.

Now we could look at these incongruencies as a problem, a source of error. Or we could see these incongruencies as information that gives us insight into a team's and a matchup's idiosyncrasies - a "what to watch for". I choose the latter.

The table below is my best effort to manipulate this unique source of data into a readable format. S1 is the predicted score for each team (and S2 for their opponent). E1 is the magnitude of the incongruencies, a higher score means the different pieces of the model are more at odds. Pt, Po, Y, and YP are different sets of predictors that draw on different types of data. Pt focuses on total game results (e.g., the final score), Po at per/possession statistics (e.g., points per possession), Y1 at total yardage statistics (e.g., total passing yards or total plays), and YP at yards per play.

Air Force, for example, has a very high E1. The negative values for Pt and Po mean that predictors of this sort see Air Force scoring fewer than 37 points. But this is offset by Air Force's yards/play (YP) point to more than 37 points. One interpretation for this result is that the model foresees bad field position or turnovers for the Falcons, so more yards per play are not translating into more points. That the total yards (Y) is also positive is more consistent with bad field position than turnovers.

Figuring out exactly what the incongruencies mean is something of a Sisyphean task in most cases, but that doesn't mean it can't be fun.

No comments:

Post a Comment