the spread the (data) science of sports

Counterintuitive findings are not (necessarily) better findings

Wed 23 September 2015

It's a common scenario -- you've got some methods under your belt, you've got some data, and you're out to prove to the rest of the world just how wrong they are about everything. You massage the data, you build dozens of models, and you finally find a way to prove that Andy Dalton really is better than Tom Brady. You take to Twitter and point out that actually any idiot can see this using your new metric, Boosted Estimated Net Guard-Adjusted Losses (BENGAL).

Not so fast. You've fallen for the trap of the counterintuitive finding. Don't feel bad, it's an alluring trap, and one that people much smarter and more accomplished than you and I fall into regularly. In fact, there's an entire subfield of economics based around it. Unfortunately, what this has also led to in a lot of academic research, is a desire to produce clever, counterintuitive findings that grab headlines and convince you that "everything you know about ___ is wrong." Taking down a sacred cow is an express ticket to attention. The reality, however, is that many of these findings are statistical artifacts, false positives, and are not reproducible.

How does this relate to sports? One of the fundamental principles of the original Moneyball movement was to identify market inefficiencies and exploit them. Obviously, in a highly competitive and mostly efficient market, finding these small inefficiencies are hugely valuable. But, they're just not that common and the chances that you've just stumbled upon one is pretty low (but not zero). Let's be clear, I'm not arguing that we know everything there is to know (see my recent post on where I see football research headed) or that Phil Simms has fourth-down logic nailed, or that team-employed analysts have cornered the market and identified all the inefficiencies -- I'd argue the opposite, that there's still quite a bit of immature research being conducted, even on teams.

What I am arguing is that you need to check your findings for robustness -- do small changes in how you construct your new metric (BENGAL) significantly change the results? Does omitting a small number of cases change your conclusions? Is it stable season-over-season? Or is it essentially random? Is your new measure or model the only one that reaches the conclusions that you've reached? These are the questions you need to ask before you decide you've found something new that upsets our conventional wisdom. Building a bunch of models and testing hypotheses over and over on the same data set will always produce false positives and often quicker than you might think.

We're often interested in measuring unobservable things like "talent" or "skill", but all we have are measures of outcomes like "touchdowns" and "rushing yards." Error is introduced when we move between the measure and the latent thing we're interested in. Sometimes that error is exacerbated by the way we build models or measures and we confuse it for signal when, in fact, it's just noise.

Does your research simply confirm what we thought we already knew about something? That's OK! As sociologist Duncan Watts says, everything is obvious -- once you know the answer. Conventional wisdom is sometimes the conventional wisdom because it's correct, but we wouldn't know it without repeatedly testing it with data. There is great value to replicating findings, and confirming that an age-old saw still holds with modern data.

blog comments powered by Disqus