the spread the (data) science of sports

Points are great, but what about win percentage? (Ranking, part 2)

Sat 24 May 2014

Colley's Method

Last time I produced a set of rankings and ratings of NFL teams in 2013 using Massey's method, which was fundamentally just a least-squares solution with margin of victory as the outcome variable. That's great, you might say, but what about teams that run up the score in some games but also lose a bunch of games? Or what about teams that win all of their games by a small margin? For that, we turn to Colley's method.

[This is the 2nd part of a series of posts using linear algebra to rate and rank NFL teams using Who's #1?. I definitely encourage you to buy the book and follow along. All of the code for this post can be found on Github.]

Note: Click here if you want to skip past all the gory math details and get to the ratings and what this means for learning how to do data science for sports.

Like Massey, Colley's work was incorporated into the BCS (RIP). Instead of using margin of victory, Colley proposes that win percentage (number of games won divided by the number of games played) is the best measure of team quality. By making a slight modification to the win percentage formula, Colley also contends that strength of schedule (SOS) is factored in to the generated ratings.

Instead of just taking wins / games, we'll modify the win percentage formula to be:

(1 + wins) / (2 + games)

This small modification buys us a couple of things. First, it means that win percentage is always defined, even when no games have been played. Without this, teams that have played zero games have an undefined win percentage, and computers hate trying to divide by zero.

Second, it acts as a quasi-Bayesian prior. In the absence of any information, the win percentage of all teams is equal to 0.5. As games are played, we can watch the win percentage move up and down relative to this number towards the team's "true" win percentage. This is also a concept closely related to Laplace's rule of succession and smoothing, topics that often emerge in data science.

If you're skeptical that this allows us to factor in SOS, I encourage you to read the book, but for now you'll just have to trust me.

Just as before, what we're fundamentally trying to do is produce some unknown (but real) vector of ratings, which we'll call r, that we can then put into an ordered list to produce rankings. Our setup is going to be very similar to Massey's method, and will be expressed as the following equation:

Cr = b

The C in that equation is very similar to the M matrix that we set up for Massey's method. The b vector is equal to:

b_i = 1 + .5(wins_i - losses_i)

where you substitute each team's information in for each team.


All that's left then is to use linear algebra to solve for r to get the team ratings. We do this using numpy's linalg.solve method, and get the following ratings and rankings.

[table id=9 /]

No surprise, Seattle is still number one, but we have a new entrant at number two -- the Panthers. This is quite interesting. Despite having a lower win percentage than the Super Bowl also-rans Broncos, they have a higher rating using Colley's method. Why could this be?

The simplest answer is strength of schedule. Carolina played games against the toughest division in the NFL, the NFC West, and won games against the Rams and 49ers. They also played the Saints and the Patriots. argues that the Panthers had the toughest schedule in all of football in 2013. Yet, they still produced 12 wins.

At the bottom of the list, we see Washington. Jacksonville, who were the worst team according to Massey's method, are only the fifth-worst team using Colley's mehod, better than Washington, Oakland, Cleveland, and Houston. Like this Panthers, this is because of the tough schedule they played.

Combining Colley and Massey

Most analytically minded fans will tell you that you can't just take into account wins, you have to factor in points. Wins are affected too much by luck. What if we could combine the margin of victory part of Massey's method with the win percentage and SOS components of Colley's method? We can!

All we need to do is change our outcome vector from modified win percentage to the margin of victory vector from the Massey example. This produces the formula:

Cr = p

This produces the following ratings and rankings:

[table id=10 /]

This pulls Denver back into the #2 spot, followed by San Francisco. Carolina, the big mover, drops down to the #4 spot. Denver's vaunted offensive production in 2013 is just too much and overwhelms Carolina's gains from strength of schedule using "pure" Colley ratings. Unfortunately for the Jaguars, this pulls them back into last place, I'm guessing due to lack of offense.


There are more rating and ranking methods to come, but you're probably wondering which one is the right one. There isn't one. It all depends on what you're trying to measure and, if you're making decisions, what you're trying to optimize for. Knowing the correct metric for your data science problem is a huge part of doing good work. This can't be overstated. Optimizing for the wrong metrics will burn you in the end.

If we take a pure machine learning perspective, the best rating method is the one that has the best predictive power for some outcome like wins. So, we'll need to validate these rating methods for predictive accuracy. So far, we've only done retrospectives on a season that's already been played. Many would argue it's no secret that the Seahawks were the best team in football and any ranking method that doesn't put them at the top of the pile is a bad one.

That brings me to a final point. The fact that these methods have so far produced roughly similar rankings isn't necessarily a bad thing -- it's probably a good thing. Social scientists call this convergent validity. In an ideal world, where we know something about team quality, measures of team quality should roughly correlate with one another. A counterintuitive finding is nice and generates pageviews, but if a finding runs too counter to common sense, there's probably something wrong with it.

blog comments powered by Disqus