Sun 02 February 2014
I was making some adjustments to the win probability model and found a great example of the points I discussed in my previous post on making sure that models are usable.
I ran through all of the current features through a feature selection algorithm in scikit-learn that takes only the k best features at each step, with k being provided by the user. Since I only have a small number of features, I just tried ran the algorithm iteratively from k = 1, ..., 9. The results were surprising, to say the least.
The features are on the y-axis and the number of times each feature was selected as one of the k best features is on the x-axis. As you can see, the score differential was selected every time. Meaning that even if you can only use one feature, it should be score differential (makes sense).
However, the current down was only selected once!
This really underscores my point. Perhaps down is not that predictive. But can you honestly say that you believe that down is unimportant to winning? Or could you tell a coach that? That being said, it's not clear that it hurts predictability or makes the model worse (more on that soon), but it doesn't seem to be doing much of the heavy lifting in predicting outcomes.
My initial hunch is that down is relatively unimportant until interacted with other features. Only one way to find out!