The Nationals are World Series champions and the offseason is here. Before we start looking ahead to the spring, however, it’s time to look back…to the spring.
That’s right, no-one who made a preseason win total prediction is getting away without the usual level of accountability. It’s time for BttP’s fifth annual prediction and projection review. Can PECOTA defend its title? Will one of the other projection systems strike back? What if the humans decide to overthrow the computers? Let’s find out who takes home the coveted title of Least Bad At Predicting Baseball, At Least For This Year.
A quick recap of how this works for the new reader. For each of the sets of predictions and projections that featured in our preseason analysis, the mean absolute error (MAE) and root mean squared error (RMSE) has been calculated. MAE is the average difference between the predicted total and the actual, while RMSE is the square root of the average of the squares of all the differences. RMSE gives greater weight to large errors because they are squared, so if you think bigger misses should be punished more heavily, this is the more relevant number.
Read the preseason piece for a full breakdown of where all the competitors stood in March, but if you want to get right to the results, here’s a quick reminder of who’s competing for the title:
The Contenders
PECOTA (PEC): The Baseball Prospectus projected win totals based on their in-house projection system.
FanGraphs (FG): The FanGraphs Depth Charts projected totals, which are a combination of the Steamer and ZiPS projection systems, with an additional playing time adjustment applied by FanGraphs staff.
Davenport (Dav): Totals based on Clay Davenport’s projection system, with Clay’s own playing time estimates.
FiveThirtyEight (538): Site projections from FiveThirtyEight.com, based on their Elo rating system.
Banished to the Pen writers (BttP): Predictions from each of our writers from our season preview series.
Effectively Wild guests (EW): Predictions from each of Effectively Wild‘s team preview podcast guests.
Composite (Comp): The average of the six projection/prediction sets above, with the BttP/EW sets adjusted down to add up to 2430 wins so they are not given extra weight.
Public (Pub): The average of all responses to a preseason poll in which I asked people to predict win totals for every team. This has replaced the PECOTA over/under game from previous editions.
The Results
Set | MAE | MAE Rank | RMSE | RMSE Rank |
---|---|---|---|---|
Pub | 7.73 | 2 | 8.92 | 1 |
538 | 7.70 | 1 | 9.08 | 2 |
Comp | 7.90 | 3 | 9.28 | 3 |
FG | 8.43 | 7 | 9.68 | 4 |
PEC | 8.30 | 5 | 9.70 | 5 |
EW | 8.40 | 6 | 9.86 | 6 |
BttP | 8.20 | 4 | 10.04 | 7 |
Dav | 9.43 | 8 | 10.70 | 8 |
What a result for the newcomers. 538 makes an impressive debut by taking the MAE victory, but the real turn-up here is the wisdom of the crowds functioning spectacularly to claim the RMSE title. The Public set was most notably closest on the Pirates compared to the other competitors: still 7 wins too high, but closer than the projection systems by 2-4 wins and a full 10 better than Effectively Wild preview guest Stephen J. Nesbitt.
Beyond that, the Public set simply did well by not whiffing too big. It was the only set to not have at least one 20-win miss, and those big misses really hurt in RMSE. The Tigers were so bad that Davenport and BttP preview authors AD and Mark Sands both missed by 24 wins, making the Public’s 19 comparatively excellent.
538, meanwhile, nailed the Cubs’ 84 win season exactly and also had the Reds pegged better than anyone else, missing their 75 wins by just two. It was slim pickings in terms of other correct win totals, as the only other correct win total came from the Composite set, which aggregated opinions on the White Sox perfectly.
FanGraphs and PECOTA had an incredibly tight battle, with PECOTA taking MAE but FanGraphs just barely edging it in RMSE. Davenport was, once again, the last-place projection system and indeed a fairly distant last place overall. Like PECOTA and FanGraphs, the two human projection sets that weren’t the public split the difference. Our BttP previewers were better by MAE but Anthony Fenech’s Tigers pessimism meant that the EW prediction missed by a mere 16 wins, tipping the RMSE calculation in their favor.
The most predictable team this year were, ironically, the Mets. Their 86 win season was missed by just 1.6 wins on average. The aforementioned Tigers were the landmine in 2019, with an average miss of 20.5 wins. Although we didn’t get any spot-on predictions this year, special mention goes to BttP previewers Scott Brady (Indians), Alex Crisafulli (Cardinals), Andrew Ingrelli (Brewers) and Peter Bloom (Nationals), who all missed their team’s total by just one, and EW guest Barry Svrluga, who did the same with the Nationals in the other direction. Lindsey Adler of The Athletic also deserves a mention for going bold with a 105-win Yankees prediction and missing by just two.
While 2019 represented a more easy to predict season than the past two, it still lags comfortably behind 2016, when the RMSE from the top set was just 7.3 and all of the predictors were ahead of this year’s winning RMSE. As the league gets more polarized, that shouldn’t be surprising; 47 or 107 win teams simply don’t show up in projections, and even most human predictions aren’t that extreme. The sortable table below shows how all of the sets compared to the final win totals.
Table 1 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Div | Team | Actual | PEC | FG | Dav | 538 | BttP | EW | Pub | Comp |
ALW | HOU | 107 | -9 | -11 | -8 | -9 | -11 | -9 | -8 | -10 |
NLW | LAD | 106 | -13 | -13 | -15 | -11 | -11 | -13 | -11 | -14 |
ALE | NYY | 103 | -7 | -6 | -6 | -6 | -6 | 2 | -5 | -6 |
ALC | MIN | 101 | -19 | -19 | -15 | -17 | -15 | -17 | -18 | -18 |
ALW | OAK | 97 | -20 | -15 | -12 | -14 | -13 | -9 | -13 | -15 |
NLE | ATL | 97 | -12 | -13 | -18 | -13 | -5 | -7 | -10 | -12 |
ALE | TBR | 96 | -10 | -12 | -8 | -10 | -9 | -6 | -10 | -10 |
ALC | CLE | 93 | 4 | -1 | -1 | 2 | -1 | -4 | -3 | -1 |
NLE | WAS | 93 | -4 | -3 | -6 | -4 | 1 | -1 | -4 | -4 |
NLC | STL | 91 | -5 | -5 | -8 | -6 | -1 | 5 | -4 | -4 |
NLC | MIL | 89 | -2 | -6 | -9 | -3 | 1 | 2 | -1 | -4 |
NLE | NYM | 86 | 1 | -1 | 1 | -1 | -2 | -2 | -3 | -1 |
NLW | ARI | 85 | -5 | -8 | -13 | -6 | -5 | -13 | -10 | -9 |
ALE | BOS | 84 | 6 | 10 | 9 | 11 | 11 | 15 | 12 | 9 |
NLC | CHC | 84 | -4 | 4 | -3 | 0 | 7 | 8 | 3 | 1 |
NLE | PHI | 81 | 8 | 5 | 4 | 3 | 9 | 7 | 8 | 5 |
ALW | TEX | 78 | -8 | -7 | -8 | -8 | 3 | 6 | -7 | -4 |
NLW | SFG | 77 | -4 | -2 | -11 | -6 | -7 | 4 | -6 | -5 |
NLC | CIN | 75 | 6 | 6 | 5 | 2 | 9 | 6 | 5 | 5 |
ALC | CHW | 72 | -2 | -2 | 5 | -1 | 6 | -2 | 1 | 0 |
ALW | LAA | 72 | 7 | 10 | 13 | 8 | 15 | 11 | 11 | 10 |
NLW | COL | 71 | 13 | 10 | 8 | 11 | 22 | 21 | 14 | 13 |
NLW | SDP | 70 | 11 | 9 | 4 | 5 | 12 | 10 | 9 | 8 |
NLC | PIT | 69 | 11 | 10 | 9 | 10 | 13 | 17 | 7 | 11 |
ALW | SEA | 68 | 4 | 8 | 15 | 11 | 6 | 9 | 7 | 8 |
ALE | TOR | 67 | 7 | 9 | 14 | 8 | 4 | 8 | 8 | 8 |
ALC | KCR | 59 | 14 | 11 | 10 | 11 | 11 | 9 | 7 | 10 |
NLE | MIA | 57 | 10 | 8 | 8 | 7 | 3 | 11 | 5 | 7 |
ALE | BAL | 54 | 4 | 8 | 13 | 6 | -3 | -2 | 3 | 4 |
ALC | DET | 47 | 19 | 21 | 24 | 21 | 24 | 16 | 19 | 20 |
Public Predictions
Given that the Public set came away with the RMSE win and only narrowly missed MAE, you might already have guessed that a few people did really well. One respondent came out with an extremely nice 6.90 MAE and an 8.43 RMSE to beat the winning marks by 0.8 and 0.49 respectively. Unfortunately, that respondent did not leave their name, so we’ll just have to call them Nostradamus.
The next-best contestant in MAE did leave their name: congratulations to Scott T. Holland, who came out with a 7.07 MAE. While slightly less of a mystery than Nostradamus, the still rather cryptic Humphrey was just barely pipped in RMSE at 8.44.
That’s not to say there weren’t some truly terrible predictions that fell way behind all of the other sets. Several people had an average error of more than ten wins, and one was even over 11.
People who looked at PECOTA predictions before completing the survey were actually slightly worse in MAE, at 8.49 on average compared to 8.40 for those who did not. They just about flipped that in RMSE, 10.04 to 10.06. There was a slight edge for those four respondents who actually used PECOTA during the survey, with 8.37/9.95 respectively.
As might be expected over a sample of this size, there were a lot more spot-on predictions here, so the special mention goes to those three participants who got three separate team totals correct: Simon G, Alex, and AJP. Full results for all 56 people who took part can be found here.
Ranks
When it came to predicting the order the teams finished in, it was no contest at all. 538 ran away with it, while FanGraphs performed much better in RMSE here by having just two double-digit misses.
Set | MAE | MAE Rank | RMSE | RMSE Rank |
---|---|---|---|---|
538 | 3.90 | 1 | 4.88 | 1 |
FG | 4.43 | 4 | 5.29 | 2 |
Comp | 4.27 | 2 | 5.42 | 3 |
Pub | 4.47 | 5 | 5.51 | 4 |
BttP | 4.47 | 5 | 5.97 | 5 |
PEC | 4.57 | 7 | 5.99 | 6 |
EW | 4.40 | 3 | 6.02 | 7 |
Dav | 4.93 | 8 | 6.44 | 8 |
Here, it was Boston which confounded our predictors the most, with almost every set but PECOTA missing their rank – all too high, of course – by 11 spots. PECOTA did marginally better at 9, while Alex Speier’s streak of impressive predictions came to an end as he missed by 15 wins and 12 ranking places.
The Twins also proved to be tricky, this time in the other direction, as every set but Davenport (-5) and 538 (-7) was out by ten spots or more on the low side. PECOTA really lost out here by being way too low on the A’s, predicting them as just the 21st-ranked team in the preseason only for them to end up fifth.
Div | Team | Rank | PEC | FG | Dav | 538 | BttP | EW | PUB | Composite |
---|---|---|---|---|---|---|---|---|---|---|
ALW | HOU | 1 | 0 | -1 | 0 | 0 | -1 | -2 | 0 | -1 |
NLW | LAD | 2 | -2 | -2 | -3 | -1 | -1 | -3 | -2 | -2 |
ALE | NYY | 3 | 0 | 2 | 1 | 1 | 2 | 2 | 1 | 2 |
ALC | MIN | 4 | -10 | -10 | -5 | -7 | -11 | -12 | -11 | -11 |
ALW | OAK | 5 | -16 | -9 | -5 | -10 | -11 | -8 | -9 | -11 |
NLE | ATL | 5 | -7 | -6 | -14 | -6 | -2 | -5 | -4 | -7 |
ALE | TBR | 7 | -3 | -4 | 1 | 0 | -6 | -3 | -5 | -2 |
ALC | CLE | 8 | 6 | 3 | 4 | 5 | 1 | -4 | 3 | 3 |
NLE | WAS | 8 | 2 | 2 | 1 | 2 | 3 | 2 | 2 | 2 |
NLC | STL | 10 | 0 | 2 | -3 | 1 | 0 | 6 | 1 | 3 |
NLC | MIL | 11 | 3 | -2 | -6 | 4 | 1 | 2 | 3 | 1 |
NLE | NYM | 12 | 4 | 2 | 5 | 3 | -4 | -4 | -3 | -1 |
NLW | ARI | 13 | -4 | -8 | -11 | -5 | -9 | -12 | -8 | -9 |
ALE | BOS | 14 | 9 | 11 | 11 | 11 | 11 | 12 | 11 | 11 |
NLC | CHC | 14 | -3 | 7 | -1 | 3 | 5 | 8 | 5 | 3 |
NLE | PHI | 16 | 10 | 8 | 6 | 5 | 6 | 3 | 10 | 8 |
ALW | TEX | 17 | -9 | -8 | -9 | -9 | -4 | 1 | -8 | -7 |
NLW | SFG | 18 | -5 | -6 | -11 | -6 | -9 | -2 | -7 | -8 |
NLC | CIN | 19 | 4 | 2 | 2 | -2 | 3 | -1 | 1 | 1 |
ALC | CHW | 20 | -6 | -6 | -2 | -4 | -3 | -6 | -4 | -5 |
ALW | LAA | 20 | 0 | 6 | 10 | 3 | 7 | 1 | 5 | 3 |
NLW | COL | 22 | 9 | 5 | 3 | 6 | 16 | 16 | 9 | 8 |
NLW | SDP | 23 | 8 | 4 | 0 | 1 | 4 | 1 | 4 | 3 |
NLC | PIT | 24 | 7 | 5 | 3 | 6 | 5 | 9 | 4 | 5 |
ALW | SEA | 25 | 0 | 3 | 12 | 7 | 1 | 2 | 4 | 4 |
ALE | TOR | 26 | 4 | 4 | 11 | 4 | 1 | 2 | 5 | 3 |
ALC | KCR | 27 | 4 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
NLE | MIA | 28 | 0 | -1 | -2 | -1 | -1 | 1 | -1 | -1 |
ALE | BAL | 29 | -1 | -1 | 1 | -1 | -1 | -1 | -1 | -1 |
ALC | DET | 30 | 1 | 2 | 5 | 2 | 5 | 1 | 3 | 2 |
So there we have it. 538 makes the case for Elo-based projection systems, while the people prove that, like the automated strike zone, the machines haven’t actually got it all figured out yet.
Next post: Will the Phillies Ever Overcome Their -6,528 Run Differential?Previous post: The Next Star from Japan
Leave a Reply