the spread the (data) science of sports

Hello world! Introducing the spread.

Sat 30 November 2013

What's all this?

I'm Trey Causey. I'm a data scientist. I'm a football fan. This site is an attempt to bring the two together, in the hopes of achieving two goals. First, to kickstart the use of methods from data science in the football analytics world. Second, to teach some introductory data science using interesting, substantive, real-world examples.

You'll notice there's not much here yet. Obviously I'm not a designer (if you want to help with that, especially with the header/logo, please get in touch!). I figured the site would never get off the ground if I waited until I had a finished product to roll out. So, I'll just update it as I go. I know it looks terrible on mobile right now. Responsive design implementations coming soon.

The name of the site is a riff on both the spread offense and the spread used in betting. This is not a betting site, does not offer any advice on betting, or endorse sports betting in any way. That being said, the betting world is often a few steps ahead of the game when it comes to analytics and forecasting.

It's a great time to be involved in sports analytics. Baseball has already seen its sabermetric revolution. Basketball is quickly following, especially with the introduction of SportVU and related technologies.

Football has been slower to warm to advanced statistics. Of course, absolutely fantastic work is being done by Brian Burke at Advanced NFL Stats, the Football Outsiders crew, and Chase Stuart at Football Perspective (to name only a few). Yet, football analytics remains largely dominated by simple cross-tabs, linear regression, and ad hoc analyses that select on the dependent variable, fail to check model assumptions, eschew out-of-sample testing, and generally don't capitalize on tremendous advances in probabilistic modeling. And if you don't know what this mean, hopefully I can teach you.

When advanced analysis *is* conducted, it's often behind closed doors. Understandably, teams want to preserve any edge they find. However, this is not only bad for the analytics community, it's bad for the advancement of football analytics. As has been pointed out on the Advanced NFL Stats podcast (I can't remember who, sorry! [Edit: It was Ben Alamar on Episode Six of the podcast, per Dave Collins, the host of said podcast, in the comments.]), without peer review, isolated analysts often have no objective check on the quality of their work.

What's coming?

The first project I'm tackling is an ensemble play-level win probability calculator. An ensemble model is when you build several (sometimes many) models to forecast the same result and combine their outputs to get a (usually) more accurate prediction. The framework is mostly in place and will be posted soon. I hope to have it working in real time before the season is over, but I can't make any promises. If you have experience building Django or Flask apps, I'd love hear your input. Second up is reconceptualizing the idea of 'field goal range' and devising a new visualization for kick probabilities.

Thanks for checking out the spread. Data science and football. Together at last.

This entry was tagged as Miscellaneous

blog comments powered by Disqus