I’m not going to lie. It has been a weird season. A global pandemic, sixty games, multiple COVID-19 disruptions, rule changes so hastily made that Calvin and Hobbes would be proud…you get the picture.
Normally, this exercise would be conducted following a number of preseason tasks, including a public poll of win totals and a preseason post covering all the predictions and projections that are involved. With all the disruption and confusion, including whether we would even get a baseball season, none of those tasks were carried out this year. I didn’t even know if I would do this evaluation post, but as all the good people involved in predictions and projections still made them, it seemed only fair that I reveal how wildly wrong everyone was.
That does, however, mean that this is not going to be quite the same as usual. For starters, I don’t have a Public projection set; unfortunate, given that the crowd was not only wise but also victorious last year. I also don’t have the Davenport projections included this year, as I didn’t get the usual snapshot right before the season started, so it wouldn’t be fair to compare those I do have available.
To underscore the profoundly disjointed nature of this season, the Effectively Wild predictions were not all made over the same period. There were 22 predictions spread over 11 pods recorded in February and March, back in that strange time when people were gearing up for a 162-game season. The remaining eight – the Tigers, Yankees, Rays, Orioles, Astros, Mariners, Marlins and Dodgers – weren’t made until July. I had to adjust those original 22 predictions down for a 60-game season, and we should bear in mind that most were made without any knowledge of the altered schedule, COVID-related absences, or more familiar injury issues. I also had to adjust the Giants prediction for both EW and BttP, as Grant Brisbee (in mid-March) and Patty Gallinger O’Connor (in July) both assumed a season that would not last the advertised length. While Grant gave a win-loss record that could be adjusted based on winning percentage. Patty guessed that the Giants would win five games before the season was cancelled. In the absence of knowing how many they would lose, I simply made them a .500 team.
I’ll also note that in the shortest MLB season of all time, this exercise is even more subject to the vagaries of random chance. Making predictions at the best of times is fraught with peril. Doing so for barely a third of a standard season with multiple additional complications for estimating talent – like opt-outs and positive tests – adds layers of difficulty that we could do without. I’ll still consider whether this season was ‘harder’ to predict than normal, but add a few extra grains of salt to any conclusions you might draw.
A quick recap of how this works for the new reader. For each of the sets of predictions and projections, the mean absolute error (MAE) and root mean squared error (RMSE) has been calculated. MAE is the average difference between the predicted total and the actual, while RMSE is the square root of the average of the squares of all the differences. RMSE gives greater weight to large errors because they are squared, so if you think bigger misses should be punished more heavily, this is the more relevant number.
Here’s a breakdown of who’s competing for the title:
PECOTA (PEC): The Baseball Prospectus projected win totals based on their in-house projection system.
FanGraphs (FG): The FanGraphs Depth Charts projected totals, which are a combination of the Steamer and ZiPS projection systems, with an additional playing time adjustment applied by FanGraphs staff.
FiveThirtyEight (538): Site projections from FiveThirtyEight.com, based on their Elo rating system.
Banished to the Pen writers (BttP): Predictions from each of our writers from our season preview series.
Effectively Wild guests (EW): Predictions from each of Effectively Wild‘s team preview podcast guests.
Composite (Comp): The average of the five projection/prediction sets above, with the BttP/EW sets adjusted down to add up to 900 wins so they are not given extra weight.
|Set||MAE||MAE Rank||RMSE||RMSE Rank|
A tie at the top between PECOTA and the EW guests, shocking in the latter case given the above context. However, it should be noted that these differences are incredibly small. In an absolute sense, five wins separated first place from sixth. To pick a team with some dissent as an example, it’s the difference between being very optimistic about the Astros and just moderately so.
PECOTA stood alone in the RMSE competition, winning for the second time in three years. The Composite set is essentially guaranteed to fare well in these comparisons but it got particularly close to winning. It’s not as easy to make direct comparisons in terms of raw wins here because it depends where the misses are. Nonetheless, if PECOTA had one more big miss, like 34 instead of 33 for the Nationals, for instance, the Composite set would have won here.
The battle for third was taken extremely narrowly by 538 over FanGraphs, after their tie in MAE. Again, a single win in the right place would have made the difference here. The same goes for the EW guests and the BFN team. There’s no doubt about which predictions were worst: our previewers here at BttP. Like the EW guests, they don’t get together to agree upon the predictions, so these can be illogically optimistic. Unlike the EW guests, they were so far off that when it came to MAE, we would have done just as well by predicting 30 wins for every team. That’s the most basic bar to clear; mercifully we did manage that in RMSE.
Level of optimism is something I normally cover in the preseason edition, so let’s take a quick look here. BttP over-predicted the total possible number of wins – 900 – by 48, while EW topped that by 33. That continues our unbroken trend of human predictions being overly optimistic, a tendency we appear collectively unable to escape. (The BFN prediction game not only involves group consultation but also forces us to make all the wins add up to the correct total, so this pitfall is avoided.)
If we pro-rate those numbers out to a 162-game schedule, totalling 2430 wins, both sets of predictions would break the all-time high of 2511 wins predicted. The EW guests come out at 2519, while BttP hit a whopping 2560, an incredible 130 wins over the possible total. They predicted six 100-plus win teams on a pro-rated basis. It’s extremely unlikely that would have happened if they were making typical 162-game predictions. This no doubt accounts for some of the inaccuracy: if we just take the adjusted set used for the Composite projection, it improves BttP by almost 0.4 wins in RMSE. That’s still last, but a much closer last. We can at least put some of this excess down to the unfamiliarity with making 60-game predictions. People might not intuitively think of 33 wins being the same as 90.
That’s less of an excuse for the EW guests, most of whom made their predictions before we could even contemplate a 60-game season. As the series moved towards the extremes, starting from the teams projected at .500 and moving out towards each end, those featured in July were either expected to be very good or very bad, and the guests were largely right. The problem was in February and March, when all but three of the 22 teams were predicted to be .500 or better, the most egregious being the Pirates and Rangers. None of the other sets were that optimistic on what turned out to be two of the worst teams in the league.
With the short schedule, the raw misses were not as big and there were a lot more close predictions than normal. BttP’s Matthew Kilmartin, the San Francisco Chronicle‘s Susan Slusser, and the BFN crew all correctly picked the 36-win A’s. We also nailed the Twins and Rockies on the BFN pod, while two other EW guests were spot on, both writers from The Athletic: Aaron Gleeman (also Minnesota); and Alec Lewis (Royals). The Reds were the easiest team to pick this year with an average miss of just 1.3 wins, while the sub-.500 Astros threw everyone off with an average miss of 7.4 wins.
Can we really determine if this season was more difficult to predict? If we assume that these win percentages had continued over a 162-game season and multiply our numbers by 2.7 (162/60), the answer is yes. We have never seen an MAE in the double-digits in the past six years, and all of these results would pass that. The RMSE marks rise into the 12-14 range, again wildly unprecedented. Typically they are around 9-10, and we’ve even seen low-7s in 2016, one of the more ‘predictable’ seasons in recent memory.
Of course, it’s very unlikely that the season would have panned out exactly this way with a full schedule. For example, the Marlins had a 26-34 Pythagorean record, allowing 41 more runs than they scored. Over a full season, they’d have needed to be much luckier to stay above .500. This method treats them as an 84-win team.
Here’s the full list of how each set compared to the actual win totals, with deeper red indicating bigger misses:
If wins are a tricky measure to deal with in the abbreviated season, what about how our contenders fared when it came to predicting the overall standings?
|Set||MAE||MAE Rank||RMSE||RMSE Rank|
The Composite takes the MAE victory, by a single rank from the BFN set (which wasn’t included in the Composite). When it came to the rest of the sets, it wasn’t all that close: 538 was 14 behind in third. The Composite perfectly blended opinions to identify six teams in their correct spots. PECOTA slipped the furthest from its win total ranking with some optimism on the Reds and pessimism about the Giants, although it was saved slightly by being rosiest on the Marlins.
It was a resounding success for the BFN method when it came to RMSE. There, the Composite was beaten comfortably into second. The EW guests really suffered, slipping into last place primarily because of those Rangers and Pirates predictions. FanGraphs also fared poorly, missing eight different teams by at least 12 spots in the rankings. It did, however, get much closer to the Padres than any other set, rating them as the seventh-best team in the preseason.
Unlike the win totals, we can draw a direct comparison here to previous seasons. It so happens that 2019 was very predictable in the context of the overall standings, so this season’s results were comfortably worse. The worst set had a RMSE of 6.44, and 538 had an excellent 4.88 mark to lead the pack. BFN would have finished a mediocre sixth in 2018 as well. However, this year was not as hard to predict by this measure as 2017, our most difficult year in the six years I have been tracking this. The best RMSE on a rank basis that year was the Composite at 8.67, far behind our last-placed finisher in 2020.
Here’s the same comparison table but for the final ranks instead of win totals:
And that concludes our journey through predictions and projections for another year. Let’s hope that 2021 is a little less tumultuous, and we can go back to being wrong under much more conventional circumstances.
Previous post: 60 Words for 60 Games: AL & NL West