the spread the (data) science of sports

Power ratings (Ranking, part 3)

Sun 29 June 2014

Eigenvalues and eigenvectors

I recently posted some thoughts on what it takes to get started in data science. Interestingly, one of what I thought was among my least controversial claims raised many questions and a great deal of doubt from readers. "How," they asked, "do you use linear algebra?" I explained that many statistical problems have a compact matrix representation, and many systems of linear equations can be represented and solved using linear algebra. What I should have said was, "to calculate power ratings."

Let's take a break from World Cup excitement and return to the science of ranking and rating NFL teams. We're going to get a little more technical this time and introduce what are often called power ratings and power rankings. Unlike many of the popular 'power ratings' that you see in the sports media, these ratings are so-called because of a mathematical algorithm involving exponents (powers) used in creating them. However, because I'm using a scientific programming language, I'm going to use linear algebra and estimate the ratings directly rather than through approximation. If you're interested in finding out how to do the following in Excel, you should definitely buy the book. What follows is an overview of this highly customizable technique.

This is a more math-intensive post than some, but I hope that reading through it will provide you with some things to think about when you're creating your own metrics.

Strength. The basic tenet of James Keener's method for producing ratings and rankings is as follows: every team has some measure of strength that we can observe. We get to choose what this metric is, as long as it is relational-- that is, it must describe events between two teams, and j. Our measure of strength could be wins, it could be points scored, it could be number of first downs, passing yards, etc. It all boils down to the idea that s~ij~ is a single nonnegative number (this is important) that describes what happened when team played team j. Going forward, I'm going to be using points scored as my measure of strength. When team i plays team and wins 45-30, s~ij~ = 45 and sji = 30. These are additive, so if team and play again and the score is 10-14, we update our two strength scores to 55 and 44, respectively.

Ratings. Keener proposes that there exists some unknown but knowable vector of ratings r that relate this measure of strength to all other teams (relative strength). Relative strength must take into account not only how good team i is, but how good team j is and how good all the teams that team j has played are, and so forth. Further, this method asserts that if there is a proportionality constant lambda that describes the relationship between our observed measure of strength s and our unknown ratings r. Finding r and lambda are the goal.

How we do that involves making a number of decisions -- this is both an upside and a downside to this method. If you like to tweak and experiment, this is a great method for rating and ranking. If you just want to know who's the best team, you have some work ahead of you.

Smoothing. Recall in a previous post, we used Laplace's Rule of Succession to "smooth" out our estimates. We'll do the same here, meaning that our strength scores will actually be a~ij~ = (points scored by team i+ 1) / (points scored by both teams + 2). These new scores, a, can be roughly interpreted as the probability that team i will be team j in the future. These values are by definition between 0 and 1 and are directly comparable.

Skewing. One of the dangers of using points scored is that we give can give too much credit to teams that Run Up The Score or teams that constantly find themselves locked in low-scoring dogfights. We can apply a non-linear transformation function to our strength scores and try and adjust for this. Langville and Meyer call this "skewing", but I prefer "skew-adjusting." You can use pretty much any function here that constraints values between 0 and 1, but an easy one to use is:

From "Who's
#1"

What does this mean in practice? Here's the function plotted over the interval [0, 1]. The dotted black line is the linear function x = y, whereas the orange line is the above function. The x-axis is the untransformed strength score and the y-axis is the skew-adjusted score. You can see that scores that hover around 0.5 are pushed further apart, and the impact of being close to one of the extremes (0 or 1) is lessened in the transformed function.

skew_adjusted_strength

Let's take a look at the effect that this has. Using the 2013 NFL season and points scored (smoothed), this is the distribution of non-zero strength scores.

non_zero_strength

As we'd expect in an "Any Given Sunday" league with lots of parity, most of the scores are centered around 0.5, meaning that most teams have a roughly even chance of beating most other teams. What does this distribution look like?

skewed_strength

Whoa! Now almost on one's around 0.5, but we have a more uniformly distributed set of scores. If we have reason to believe that the NFL is a highly unequal league, we may consider using this distribution. I don't necessarily think that it is, but it's good to explore your options.

Eigenvalues and eigenvectors. Now let's actually rate and rank some teams. A full explanation of eigenvalues and eigenvectors is well beyond the scope of this post, but they play a vital role across a broad range of statistical and mathematical problems. And they end up being really useful quantities for describing matrices of data. In fact, the eigenvalue that we're interested in finding is the lambda described above -- that proportionality constant that describes the relationship between our observed strength scores and our goal, the ratings scores. This one number, lambda, allows us to transform our matrix of offensive performance into a 32-team eigenvector.

Confusing, I know. Luckily, there are functions in SciPy for finding eigenvalues and eigenvectors. That's just what I've done and all of the code is on GitHub. The tricky part is that eigenvalues can be complex numbers (remember, those numbers that can have both real and imaginary components? Of course you do). For our purposes, we're looking for the largest positive eigenvalue without an imaginary component. It ends up being 6.415 for the 2013 season. What does that mean? It's not important. What is important is that this constant is the same for every team, because strength of schedule is already factored into our calculations.

We can then use this constant to produce our ratings vector r (the eigenvector). We normalize this vector so that the values sum to 1.0 and are directly comparable no matter what measure of strength we use above.

The ratings and rankings.

[table id=14 /]

Well, once again, we see Seattle's on top of the rankings. They really did have a historic season. San Francisco and Carolina also figure in the top 3. Cincinnati makes its top 5 debut at number 4, beating out Denver! This seems controversial to me. We also see the Giants third from bottom, the lowest they've been ranked so far.

What happens if we use the skew-adjusted ratings?

[table id=15 /]

According to this, the Bengals were the second-best team in the NFL in 2014! I'm not sure about that. They did go 11-5 and beat the Patriots, but... Anyway. I think the slightly differing opinions that each rating algorithm has provided has underscored something I've talked about in a lot on this blog: the need for ensemble models. We'll be returning to that as the rating and ranking series of posts draws to a close.

blog comments powered by Disqus