the **spread** the (data) science of sports

Wed 23 September 2015

Super Bowl Sunday is finally here, and discussion of {Ballghazi, Deflategate} has dominated much of the sports analytics world for the past two weeks. So, I thought I'd totally ignore that topic and talk about something else: building an expected points (EP) model. Expected points did get some renewed attention lately, in this FiveThirtyEight article on the rise of kicking accuracy over time and how fourth-down decision-making could be affected.

However, lots of expected points models already exist, so my goals are to accomplish the following:

- Provide code examples for building an expected points model.
- Interrogate the assumptions that go into such a model.
- Show how to incorporate uncertainty into the model using the bootstrap (previous discussion)

If you're not familiar with expected points, I encourage you to read excellent descriptions from Brian Burke at Advanced Football Analytics for a more in-depth overview. In the spirit of educating sports analytics newcomers, Brian has also created two YouTube tutorials (1, 2) on building an expected points model. This is seriously a great service to the community. These tutorials and Brian's explanation were instrumental in writing this post.

Here's the basic idea behind expected points. Given any combination of down, yards to go, and distance from the end zone, the expected value of the points from that position are equal to the average of every *next score* from that position. That next score could be on that play via a field goal or touchdown; it could be several to many plays later through a successful drive. It could also be negative -- the next points are scored by the other team.

So, you can imagine that the expected points from one's own one-yard line are probably negative, because even if you punt the ball away, your opponent will probably have very good field position to start their next drive and will likely get at least a field goal out of that possession.

Similarly, you can imagine that the expected points on 1st and goal from your opponent's one-yard line are somewhere between 3 and 7 because you'll have nearly four tries (barring fumbles and interceptions) to score a touchdown or kick a field goal.

You get the idea. The reason we build these kinds of models are to place a value on every position on the field to allow for in-game decision-making. By being able to compare the expected points from a variety of possible outcomes, we can choose the play call that allows for maximizing the number of expected points. There may be game scenarios when you're more interested in maximizing expected points (for instance, early in the game when an individual play may not have much impact on overall win probability).

Building the model itself is just a bit of Python, made easier by the indexing and grouping capabilities of pandas. It's just data manipulation and the only statistical procedure involved is taking the mean. You can find all of the code in an IPython notebook on Github (NBViewer).

This is where the fun starts. There are a number of assumptions that go into building this kind of model. For a start, Burke recommends throwing out plays where the score difference is greater than 10 and from the 2nd and 4th quarters. The reasoning behind this is that teams operate differently when facing or delivering a blowout or when the half is about to end. For instance, a winning team may just run their RB into the wall repeatedly towards the end of the game, not really trying to gain yards or score more points. This could distort the effects of these plays on points scored. Seems like a logical assumption.

However, I'm always a fan of presenting how assumptions change analyses, so I'll present it both ways. This is one way of measuring the effects of your assumptions, but it's also a good way to see how *robust* your conclusions are to changes in the data. Let's take a look at expected points as a function of field position on first down with and without these plays removed.

Surprisingly, not much of a difference! Looks like the trimmed data produces slightly higher estimates of expected points than the complete data in the opponent's half of the field. But *how much* of a differnece is "not much"? Great question.

Burke uses a smoother, a kind of local regression known as LOESS. This is definitely one approach to smoothing out those bumps and getting a better sense of the 'true' expected points contained in those noise lines. I'm going to take a slightly different approach and use a statistical technique known as the *bootstrap* to build confidence intervals around those expected point values. Why do this?

The expected points we've plotted above only represent the plays we've actually seen happen. But they are just an estimate. We want to make some inferences about the range of possible outcomes we didn't see. We assume that the plays we saw are drawn from some distribution of outcomes from alternate universes or whatever. We can simulate what this distribution looks like by taking repeated samples with replacement from the plays we actually saw. This procedure has some nice properties that I won't get into in too much depth, but one of the nicest things is that it doesn't assume anything about the distribution of the statistic we're interested in.

The confidence interval that we build up here will give us some idea of how much variation we might expect in our estimator (expected points) if we were to keep sampling from the distribution that generated the observations we already have. Let's take a look at the 95% confidence interval for the original expected points.

As it turns out, the expected points estimated by using only 1st & 3rd quarters and close games falls outside of our confidence interval quite often in the opponent's half of the field! This is very interesting. Note also that the uncertainty around expected points is at its greated the closer you get to your own end zone, and the least the closer you get to your opponent's endzone. This is intuitive, but it's always good to know if your estimator has constant variance or not.

This post is already growing too long, so I'll split it into two posts. Next up, I'll look at FiveThirtyEight's comments that kicking accuracy has changed expected points over time. I'll also discuss the pros and cons of inventing one's own measure (such as expected points).