the **spread** the (data) science of sports

Sun 29 June 2014

Eigenvalues and eigenvectors

I recently posted some thoughts on what it takes to get started in data science. Interestingly, one of what I thought was among my least controversial claims raised many questions and a great deal of doubt from readers. "How," they asked, "do you use linear algebra?" I explained that many statistical problems have a compact matrix representation, and many systems of linear equations can be represented and solved using linear algebra. What I should have said was, "to calculate power ratings."

Let's take a break from World Cup excitement and return to the science of ranking and rating NFL teams. We're going to get a little more technical this time and introduce what are often called power ratings and power rankings. Unlike many of the popular 'power ratings' that you see in the sports media, these ratings are so-called because of a mathematical algorithm involving exponents (powers) used in creating them. However, because I'm using a scientific programming language, I'm going to use linear algebra and estimate the ratings directly rather than through approximation. If you're interested in finding out how to do the following in Excel, you should definitely buy the book. What follows is an overview of this highly customizable technique.

This is a more math-intensive post than some, but I hope that reading through it will provide you with some things to think about when you're creating your own metrics.

**Strength. **The basic tenet of James
Keener's method for producing
ratings and rankings is as follows: every team has some measure
of *strength* that we can observe. We get to choose what this metric is,
as long as it is *relational*-- that is, it must describe events between
two teams, *i *and *j. *Our measure of strength could be wins, it could
be points scored, it could be number of first downs, passing yards, etc.
It all boils down to the idea that *s~ij~* is a single *nonnegative*
number (this is important) that describes what happened when
team *i *played team *j*. Going forward, I'm going to be using **points
scored** as my measure of strength. When team *i* plays team *j *and
wins 45-30, *s~ij~* = 45 and *sji *= 30. These are additive, so if team
*i * and *j *play again and the score is 10-14, we update our two
strength scores to 55 and 44, respectively.

**Ratings. **Keener proposes that there exists some unknown but knowable
vector of ratings *r* that relate this measure of strength to all other
teams (relative strength). Relative strength must take into account not
only how good team *i* is, but how good team *j* is and how good all the
teams that team *j* has played are, and so forth. Further, this method
asserts that if there is a proportionality constant lambda that
describes the relationship between our observed measure of strength *s*
and our unknown ratings *r*. Finding *r* and lambda are the goal.

How we do that involves making a number of decisions -- this is both an upside and a downside to this method. If you like to tweak and experiment, this is a great method for rating and ranking. If you just want to know who's the best team, you have some work ahead of you.

**Smoothing. **Recall in a previous
post,
we used Laplace's Rule of Succession to "smooth" out our estimates.
We'll do the same here, meaning that our strength scores will actually
be *a~ij~* = (points scored by team *i*+ 1) / (points scored by both
teams + 2). These new scores, *a*, can be roughly interpreted as the
probability that team *i* will be team *j* in the future. These values
are by definition between 0 and 1 and are directly comparable.

**Skewing**. One of the dangers of using points scored is that we give
can give too much credit to teams that Run Up The Score or teams that
constantly find themselves locked in low-scoring dogfights. We can apply
a non-linear transformation function to our strength scores and try and
adjust for this. Langville and Meyer call this "skewing", but I prefer
"skew-adjusting." You can use pretty much any function here that
constraints values between 0 and 1, but an easy one to use is:

What does this mean in practice? Here's the function plotted over the interval [0, 1]. The dotted black line is the linear function x = y, whereas the orange line is the above function. The x-axis is the untransformed strength score and the y-axis is the skew-adjusted score. You can see that scores that hover around 0.5 are pushed further apart, and the impact of being close to one of the extremes (0 or 1) is lessened in the transformed function.

Let's take a look at the effect that this has. Using the 2013 NFL season and points scored (smoothed), this is the distribution of non-zero strength scores.

As we'd expect in an "Any Given Sunday" league with lots of parity, most of the scores are centered around 0.5, meaning that most teams have a roughly even chance of beating most other teams. What does this distribution look like?

Whoa! Now almost on one's around 0.5, but we have a more uniformly distributed set of scores. If we have reason to believe that the NFL is a highly unequal league, we may consider using this distribution. I don't necessarily think that it is, but it's good to explore your options.

**Eigenvalues and eigenvectors. **Now let's actually rate and rank some
teams. A full explanation of eigenvalues and eigenvectors is well beyond
the scope of this post, but they play a vital role across a broad range
of statistical and mathematical problems. And they end up being really
useful quantities for describing matrices of data. In fact, the
eigenvalue that we're interested in finding is the lambda described
above -- that proportionality constant that describes the relationship
between our observed strength scores and our goal, the ratings scores.
This one number, lambda, allows us to transform our matrix of offensive
performance into a 32-team eigenvector.

Confusing, I know. Luckily, there are functions in
SciPy for finding eigenvalues and eigenvectors.
That's just what I've done and all of the code is on
GitHub.
The tricky part is that eigenvalues can be complex numbers (remember,
those numbers that can have both real and imaginary components? Of
course you do). For our purposes, we're looking for the largest positive
eigenvalue without an imaginary component. It ends up being 6.415 for
the 2013 season. What does that mean? It's not important. What **is**
important is that this constant is the same for every team, because
strength of schedule is already factored into our calculations.

We can then use this constant to produce our ratings vector *r* (the
eigenvector). We normalize this vector so that the values sum to 1.0 and
are directly comparable no matter what measure of strength we use above.

**The ratings and rankings.**

[table id=14 /]

Well, once again, we see Seattle's on top of the rankings. They really did have a historic season. San Francisco and Carolina also figure in the top 3. Cincinnati makes its top 5 debut at number 4, beating out Denver! This seems controversial to me. We also see the Giants third from bottom, the lowest they've been ranked so far.

What happens if we use the skew-adjusted ratings?

[table id=15 /]

According to this, the Bengals were the second-best team in the NFL in 2014! I'm not sure about that. They did go 11-5 and beat the Patriots, but... Anyway. I think the slightly differing opinions that each rating algorithm has provided has underscored something I've talked about in a lot on this blog: the need for ensemble models. We'll be returning to that as the rating and ranking series of posts draws to a close.