Why Does OPS Work?

I want to show OPS some love. I want to state that not only is OPS intuitive and useful, it is a good and mathematically sound statistic. In short, I want to argue against the idea that it’s “a kludge,” “brute-force addition,” and “deeply flawed at a basic math level.” I decided OPS needed more appreciation after reading chapter 1 of Smart Baseball by Keith Law, from whence these descriptors came.

Let’s start with the obvious. With a quick scan of a newspaper-style box score, I can tell what sort of game someone had. People get on-base percentage; they get slugging percentage. They get that OPS somehow matches Earl Weaver’s desire for three-run homers. This is something. There’s nothing wrong with more sophisticated measures of better performance, but something like wRC takes time for the casual fan to learn. The time may not be there, nor the desire, and that doesn’t make the fan foolish. I explained on-base percentage and slugging percentage to my daughter when she was 7 and she knew why Juan Pierre had to be replaced in right field by Andre Ethier, way back when, in the Dodger outfield.

We can also sense intuitively that on-base percentage over-values singles and walks, and that slugging percentage over-values home runs. (I don’t think Earl Weaver objected to bases-clearing doubles.) By adding OBP and SLG together you soften these two tendencies. Walks are now less important than singles. Home runs are still better than singles, but not by so much.

But what about the denominators? They’re different. Well, who cares? 2/5 + 3/4 = 1.150. (A triple and a walk in 5 plate appearances is a good game!) These different denominators give a perfectly acceptable sum. There is no underlying flaw. There are simply some complications.

Compared to what? Many people use tools that estimate performance with so-called linear weights. You try to write down an expression like:

Where S is the number of singles, D is doubles, etc. The different “a” numbers are the linear weights. I call this a rate merit because this measures productivity per plate appearance, PA. There is no a priori reason to choose a linear model; they’re just easier than non-linear ones. Defenders of OPS: do not be intimidated by people saying their math is more rigorous. Linear weights is a model, not the Truth with a capital “T.”

On-base percentage looks like a rate metric. Neglecting strange plate appearances (can we please agree to ignore catcher’s interference for this argument?), on-base percentage is simply the rate merit with a_S = a_D = a_T = a_HR = a_BB = 1

Slugging is a bit different. Slugging is divided by at-bats, which is really close to being plate appearances less walks. This leaves us with:

The denominators are different, and in the mathy section below I explain how I deal with that. But the key observation is this: walks only happen about 9% of the time. And most players don’t have extreme walk totals. After doing some work to get the denominators the same, you can just add up the fractions if you assume, for the sake of the denominator, that everyone has the same walk rate. You use the correct number of walks in the numerator, but you let the denominator use the league average. I call the league-average walk rate ß₀ (in 2018 it was 0.0887 through about September 1). If you use this approximation you get:

What we have here is a set of linear weights. We have a rate figure of merit equal to a number of singles plus a number of doubles, etc… all divided by plate appearances. Not only that, we can see that the linear weights make sense. Home runs are roughly 2.5 times more valuable than singles. More precisely, the weight for home runs is 2.57 times that of singles. In wRC it’s 2.37. OPS under-values walks. Here the weight for a walk is 0.48 times that for a single; wRC has the value closer to 0.78.

Why does OPS deserve some more love? It’s simple, easy to calculate, and with some pretty sensible mathematical approximations, it can be written to look like the fancier models. The correlation of OPS with run production is cooked into baseball.

The Mathy Part

I took the formula for OPS, simplified by assuming that hit-by-pitch, catcher interference, and sacrifice did not exist, and wrote the math to convert OPS to a set of linear weights. The approximations are reasonable, the results are consistent, and the linear weights so obtained are similar to those form more conventional calculations. I tried to be careful with the math and I calculated error terms. I tested my hypothesis on 2018 data from FanGraphs downloaded on 9/2/18.

Starting with the basics:

Where the δ (delta) is an error term for catcher’s interference and sacrifice bunts.

We will write PA=AB + BB + ε. The error term, ε (epsilon), is every plate appearance that is neither an at-bat nor a walk. We can now write the result for OPS:

Or, as a single fraction:
The numerator breaks down into two important and two less important terms:

From now on the terms proportion to δ and ε will be dropped.

Here is where the largest approximation needs to be made. By definition, BB = BB% PA/100 where BB% is as defined on the FanGraphs glossary page. I write BB%/100 = ß₀+ß_i. Here, ß₀ is an average or typical value for the ensemble and ß₀+ß_i is the value for a given batter. Note that a sensible choice of ß₀ will make ß_i small for most batters. I will use the mean value of BB%. The error this generates will be discussed later. In the detailed evaluation below, ß₀ = 0.0887.

Using this we write:

The top two terms in the numerator become:

Using the same approximation for the denominator we obtain PA² (1-ß₀)

Putting it all together:

Or, setting the co-efficient for singles to 1,

Now you can set your favourite average BB% and go to town. A couple of important qualitative features can be seen: First, OPS undoes the overweighting of home runs in SLG; a 4:1 ratio is now 2.5:1. Secondly, walks are half as important as singles in OPS.

Does this actually work? Well, yes. The 2018 FanGraphs leaderboards were downloaded for qualified hitters on 9/2/18. A custom table was made to count S, D, T, HR, and BB separately, and the approximate OPS was compared to the definition ß₀=0.0887. Clearly the linear and actual values are closely related:

But the results are not perfect and, as you might expect, the difference between the linearized value and the actual value depends on BB%. The other point is that the linear values are systematically too small–20 points lower on average.

The linear weights calculated from OPS are:

	Linear OPS	wOBA	wOBA (S=1)
Single	1.00	0.888	1.00
Double	1.52	1.271	1.43
Triple	2.05	1.616	1.82
Home Run	2.57	2.101	2.37
Walk	0.48	0.69	0.78
Everything else	0.00	…	…

FanGraphs uses wOBA and reports the 2013 weights on its glossary page. Those are shown in the table above as reported by FanGraphs and rescaled to 1 for singles.

OPS does work. It is perfectly justified. With simple approximations it’s converted to a set of linear weights that is reasonable. It doesn’t mean that everyone should use it, but it is time for serious baseball writers to treat this as something useful, intuitive, and widely accepted rather than a happy accident.

Chris Jillings

Next post: The Perfect Walk-Off
Previous post: Evaluating the 2018 Predictions and Projections

Saul Freedmam

If this is what I need to enjoy a ball game then it is football sand soccer for me!

October 11th, 2018 Reply

Ken Maeda
I don’t think anyone says that it is.
October 11th, 2018 Reply

Rusty Southwick

Chris, how do we reconcile the missing baserunning component from OPS? Rickey Henderson almost looks mortal. (318th all-time in career OPS)

And yet look at some of these in the top 100 all-time…
27. Lance Berkman
62. Brian Giles
69. Ryan Braun
73. Matt Holliday
76. Prince Fielder
93. Kevin Mitchell

Lots of oddities on the list, high and low, which suggests a hidden bias toward certain types of batters, even beyond the missing baserunning component. What do you think?

October 12th, 2018 Reply

Dominic Rivers

OPS is full of ‘S.’ Sure, it often get’s it right, but there are other easy-to-understand linear weight stats that are much more reliable. Rc/27 for example.

January 21st, 2019 Reply