the spread the (data) science of sports

Probabilities, models, and reality

Tue 10 December 2013

Alternate title: statistics and smugness

If you follow me on Twitter, you may have seen me rant here and there about sports analytics people and treating statistical questions as settled and being too smug about the conclusions they draw from existing analytical work. This might sound a little crazy coming from someone who builds statistical models for a living and for a hobby. Am I actually advocating for journalistic narratives of grit and momentum?

Not at all -- in fact, the best data scientists are up front about their uncertainty and about treating models as if they are reality. There's even a term for this: reification. The models that we build are abstractions of reality. This is a feature, not a bug. Model building is all about identifying the information that best characterizes a phenomenon and generalizes across multiple scenarios.

Recently, the New York Times introduced a new feature in conjunction with Advanced NFL Stats, the Fourth Down Bot. Fourth down is one of those situations in which the analytically savvy know that coaches are far, far too conservative. This bot tracks every fourth down situation each weekend in the NFL and crunches the numbers to see what would be an optimal call and whether or not the coaches made the 'right call.'

So what's the problem? See Aaron Schatz below, or Michael Lopez.

The problem with @NYT4thDownBot as currently built is it encourages the idea that 4th D is always a cut-and-dry decision.

— Aaron Schatz (@FO_ASchatz) December 3, 2013

If you read Grantland on a regular basis, you'll know that win probabilities are often given as arguments for why coaches did or didn't do the right thing. Bill Barnwell has a regular feature, Thank You For Not Coaching, in which he routinely does so.

As I mentioned in an earlier post, these win probabilities are estimates. There's some error and uncertainty associated with these estimates. Unfortunately, we don't usually know their magnitude. Treating these numbers as if they're the actual probability of winning is a fallacy. 

To expand upon this further, take the example of the Advanced NFL Stats Win Probability model. If it seems like I'm picking on Brian Burke's work, let me assure you quite the opposite is true. Brian's a giant in the field and his work is the standard against which all other work is often compared (which is why the Times partnered with him).

Recently, some major enhancements were added to the ANS model. Most notably? Incorporating team strength into the model. Think about this for a second -- the existing model that most everyone was using to make their arguments about coaching had not yet taken into account if one team was known to be better than the other team. Now, Brian lays out the case for being agnostic about team quality -- and it is a good one -- when building a win probability model; and the model that we initially build here will not, either.

[Technical sidenote: As far as I know, Brian's model is not Bayesian, but it uses the spread for a game as a sort of prior that decays in its impact as the game is played. You could think of this as a Bayesian prior with influence that is eventually overtaken by the likelihood. There needs to be a lot more Bayesian work in sports.]

Yet, this is a major update. I have no idea how much this changes estimates and what previous estimates would look like when using the new model. But it serves as an important corrective -- the model you're using is never the model. It's just a current iteration of a set of models. The estimates that the model produces aren't reality. They're the model's best guess about reality using a limited set of simplified inputs.

This is why it's so tremendously important, as data scientists and as sports analysts, to iterate, iterate, iterate, then validate, validate, validate. Models become stale the minute they're deployed. Are they robust to outliers and extreme values (so-called 'black swans?') Has the data generating process changed? Are there heterogeneous populations that are being modeled as homogeneous?

These are the questions you need to ask of yourself and your models.

blog comments powered by Disqus