the **spread** the (data) science of sports

Wed 23 September 2015

In the last post, we built a basic expected points model and showed how we can estimate uncertainty using a statisical procedure called the *bootstrap*. Now I want to push our assumptions a little further and look at how expected points have changed over time and I want to talk about why we want to estimate uncertainty in the first place.

FiveThirtyEight recently published a very interesting article demonstrating that kickers have continually improved in accuracy over time, and that this is likely not taken into account in expected points models that are used in many fourth-down decision arguments. My initial reaction is that this probably is an overreaction -- the more sophisticated fourth-down models out there often have a more rigorous kicking input than just historical averages that don't adjust over time. My second reaction is to take a look and see how expected points have changed over time.

Let's take a look at how first downs (across all yards to go) have changed over the years.

Well, now we see why people use smoothers for these kinds of things. That's a noisy mess. Let's take another look, using the same kind of smoother that Burke uses in his expected points model. To help look at how expected points are changing over time, I've set the blue to get darker as the data becomes more recent (i.e., the darkest year is the 2013 season). The overall average expected points is the line in red.

It definitely appears that expected points have risen over time, at least for first downs. But the whole point is to look at uncertainty in these estimates, so let's bootstrap a confidence interval and see how this changes our perception. This adds a new dimension of complexity, though, so let's take a specific game situation. For illustrative purposes, I'll show first and ten from the opponent's 35-yard line.

It definitely looks like expected points have risen *some*, but the expected points for a first down in 2013 are still within the 95% confidence interval for 2000. That doesn't seem to mesh with the earlier statements at all!

There's a catch. Each of the points in that plot is only using a single season's worth of data. This is an important fact to learn about confidence intervals! As your sample size *increases*, your confidence interval gets more *narrow*. In other words, we can more precisely estimate the statistic we're interested in as we have more observations. This is statistics 101 stuff, but it's easy to forget.

To better compare, I computed a four-year rolling average for expected points and looked at how the same game situation, first and ten at the opponent's 35-yard line, has changed in expected points over time, and bootstrapped a 95% confidence interval. This allows us to observe if the value is changing over time, gives us a better sample size for estimating uncertainty, and doesn't let earlier seasons' data affect later seasons' data.

That definitely looks like an increase, and it certainly appears that expected points are on the rise! Let's compare the expected points from 2004-2007 with 2010-2013.

Looking at the distribution of the expected points from the same point on the field, there's very little overlap between the two (the purple area reflects where the densities overlap).

OK, so we've looked at expected points from a variety of angles and have found that they do, indeed, appear to be rising over time. We've also found that this has appeared to be the case more for the opponent's half of the field than one's own half of the field. Potential explanations for this include:

- More accurate kickers
- Play-calling has gotten more aggressive closer to the opponent's end zone
- As passing has risen, so has scoring

The answer isn't immediately clear from this analysis. One thing we *do* know, however, is that by estimating the uncertainty associated with the expected points statistic, we're in a much better position to say if that change is meaningful or not.

Why have I spent so much time banging on about uncertainty? Because we're often making arguments about which play calls are the better play calls based on *differences* in expected points (expected points added) from the plausible range of outcomes. For instance, going for it on fourth down vs. kicking a field goal, vs. punting. If we don't know how variable the statistic is, we're not really doing better than random guessing.

Take the above example. Using data from 2010-2013, the 95% confidence interval for expected points for a 1st and 10 at the opponent's 35 ranges from 3.0 to 3.68. We think that plausible values for the 'true' expected points from that scenario lie in that range, based on the data we've collected. Say the 'true' value is closer to 3.0, say 3.15 and we make a decision that is supposed to net us a half a point in expected points. It's entirely possible that we haven't really made any positive gains at all!

Simply put, without stating uncertainty, it's hard to know when we're making progress or losing ground.

We haven't really discussed whether or not expected points is a 'good' statistic or not. I think it's an entirely reasonable statistic and an entirely reasonable approach to a difficult problem. However, it's worth noting that there are some problems with it. For instance, the number of plays and possessions between scores is highly variable. Is it possible that the expected points on a drive that ultimately results in a touchdown are the same as those from the same field position but where the 'next score' comes after four changes of possession? It's hard to say.

An alternative exists, but it's more methodologically complicated. David Romer, an economist, wrote a famous paper [PDF] on fourth downs using a method called *dynamic programming.* This paper has grown a little long in the tooth, so perhaps it is time to revisit it with modern data! A project for another day.