the spread the (data) science of sports

Why is it so hard to know if changing coaches has any effect?

Wed 23 September 2015

Most football fans know the pain of suffering through seasons with what is obviously a terrible coach. If only your team would move on from that terrible coach, your losing days would be over (or at least fewer in number)! You look wistfully at teams with stable coaching situations, who plan for the long term and don't make stupid decisions, both strategic and tactical. Once the season's over and you get that new coach in, your problems will be solved.

Of course, your problems probably won't be solved. Most of the people who have asked "how much will changing the coach help the team" have found the answer to be somewhere between "a little" and "it won't." Yet, it seems so incredibly obvious that some coaches are bad and getting rid of them will help the team improve. So why can't we demonstrate that this is true?

I propose that this is essentially a case of what economists call an identification problem. We're trying to estimate the effect of a coaching change on success, but we're having a hard time doing it for a number of reasons. I'll outline them below, giving both "practical" explanations and "statistical" explanations.

Measurement error

Notice above I said we were interested in the effects of coaches on "success." Well, what does that mean? In football, we usually mean wins, but we all know that the NFL is characterized by small sample sizes and high variance. A coach who's given two years to prove himself (I wish I could say or herself here) only has 32 games to do so, barring playoff appearances. So, obviously wins aren't a great metric.

There are adjusted metrics out there, like Pythagorean wins, strength-adjusted wins, etc., but ultimately we're facing a problem where there is a lot of measurement error in the outcome.The noisier the outcome of interest is (i.e., the higher the variance), the larger the sample we will need to establish a correlation between our treatment (coaching) and our outcome (wins). We're facing the worst of both worlds here.

Additionally, we're facing low statistical power due to our small sample size and (likely) small effect size. Even if there is a true but small causal effect of coaching changes, it would take a much larger sample size to detect it. Underpowered studies are a real problem, especially given how many people misinterpret non-significant statistical tests as "accepting the null hypothesis."

Collinearity in the predictors

The vast majority of teams change coaches because they have either fired their coach or their coach has been recruited to replace a recently fired coach. All of the things that cause a team to be successful or unsuccessful co-occur with the coaching changes.

When you have a lot of collinearity in the predictors, it's very difficult to precisely estimate the effects of each individual predictor. Most regression techniques require you to be able to identify the independent impact of each predictor on the outcome. If the variables mostly vary together, you can't identify this individual impact, because you've never observed what happens when one thing changes and the other doesn't! Unfortunately, once again, the solution is usually a larger sample.

In other words, variables have to vary -- which sounds ridiculous when you put it that way, but it's a fact often overlooked.

Lack of variation

This time I'm referring to variation in the underlying abilities of coaches. It's clear that there are some truly special coaches out there. Bill Belichick obviously comes to mind. It's also clear that there are obviously some truly bad coaches out there (insert your favorite loser here). Yet, I'm proposing that the vast majority of coaches are probably roughly equal in skill / talent / ability / whatever you want to call the latent characteristic that predicts success. Essentially, I'm arguing that most coaches are replacement level.

I'm definitely not saying that you could just stick anyone in a head coach job and expect similar results to Norv Turner. I'm merely saying that most coaching changes are probably like-for-like exchanges. Combine this with the fact that very often when teams make coaching changes, they pull from the same rotating pool of coaches and coordinators that are always mentioned for job openings. Why would you expect a coach to be wildly successful at the Raiders if he wasn't wildly successful at the Jets?

Making matters even worse, we don't actually have a measure of talent / skill / whatever (that's what we're trying to get at here!), but even if we did, we probably would observe a lot of Peter Principle at work. When successful offensive or defensive coordinators are recruited to be a head coach at a (likely losing) team, they are required to use an entirely different set of skills than what made them successful in their previous position.


It's pretty obvious that teams that change coaches don't do so randomly. Many of these things are completely unrelated to the coach as a person, but may have been related to his hiring and firing. These are often pointed to as the "culture" of organizations, a sponge that soaks up a lot of variance. In reality, it could be many things -- the way the owner treats the GM, the way contracts are structured, the quality of scouting, etc.

In other words, there is some unobserved variable that is causing both the team's success and the coach's hiring and firing. Many social sciences call these "omitted variables" or "confounding variables" and economists call this problem endogeneity. This introduces bias into our estimates of the effects of coaching because we think we've controlled for all relevant variables related to the coaching change and success, but we haven't.

What to do about it?

Of course, there are lots of other more mundane explanations -- reversion to the mean being a notable one. Teams that have especially bad years may fire their coach, but teams having especially bad years are more likely to have slightly less bad years the next year, coaching change or not. Don't forget the Hawthorne Effect either.

So what do we do about it? There aren't easy answers, but more simple regressions are definitely not the solution. Thinking hard about causal inference is the bread and butter of econometrics. I see two possible approaches here (and definitely read Chris Blattman for thinking about when each is appropriate).

Experiments. The gold standard for establishing causation is a randomized experiment, meaning randomly change coaches in a way that is uncorrelated with the measure of success. Obviously this is a non-starter.

Instrumental variables. The first solution is to find an instrument for the effect of coaching changes. An instrument is some variable that affects our outcome only through its effects on the causal variable of interest. This is a really tricky topic to think about, and I recommend reading through some examples to ge ta grip on it. Much of the "Freakonomics" movement in economics revolved around finding clever instruments (such as rainfall) to solve tricky causal problems.

Matching. Another approach is to use matching methods, which involves pairing all of your observations up to find the most similar pairs on all of your other independent variables except the one of interest (a coaching change). You then estimate the causal impact of that change by looking at mean differences in success between the pairs. This does not solve endogeneity, however.

Natural experiments. Sometimes "nature" presents us with an experiment such as expansion, unexpected retirements or deaths, and so on. In these cases, we observe a coaching change (the treatment) in a way that is (hopefully) uncorrelated with the outcome. Unfortunately, these are few and far between, which puts us back in the underpowered scenario due to small sample sizes. So, when your grandkids are analyzing this question, maybe we'll have enough data (doubtful).


As you can see, the cards are basically stacked against being able to precisely estimate the "true" effect of a coaching change. This doesn't mean we should stop trying! It just means we need to be more creative and perhaps a bit more sophisticated in how we try to answer the question.

blog comments powered by Disqus