On Projecting Baseball

If you don’t subscribe to Bill James Online, get on it. It sets you back just three bucks a month, and although James doesn’t write articles as often as you dream he will, what he does write is terrifically candid, and typically excellent for its comprehensive approach. On top of which, he answers questions (subscribers can ask any time) and the site offers detailed Baseball Info Solutions fielding data and other numbers well worth the small expense.

Often, James weighs in with a lengthy meditation (he doesn’t do much research himself anymore, or at least, he gives his original research to the MLB team that pays him handsomely for that privilege) on a hot-button issue, and the comments spur debate–sometimes even another article. So it was this month, when James wrote at length on the fundamental differences between hitting and pitching prospects (young players, perhaps, better frames the debate) in the wake of the Wil Myers trade.

James’ point was that it seems pitchers are inherently and really more difficult to project or evaluate than batters. He went through a handful of good theories as to why. Subscribe to find them out. They’re not directly material to my point, and I’m not in the business of giving away other people’s proprietary thoughts.

Commenters took James to task. Some were enlightened; some were single-track, relatively thoughtless trolls. The crux of the debate quickly shifted from the difference between hitting and pitching prospects, to the difference (if one exists) between the predictive validity of minor- and major-league stats. Mostly for batters.

It’s become an open-air wrestling match between James and Tom Tango–two sabermetric titans with tremendous insight; tons of experience; generational minds; and egos that occasionally make them each insufferable. They’re wagging their, well, let’s say spreadsheets, at one another, comparing projections and arguing over whether the fact that it gets progressively easier to project future performance the longer a player has been in MLB proves that minor-league stats are inherently less informative than major-league stats, and what the distinction should be between “prospects” and simply players.

Am I missing something?

I read, as part of my so-far silent participation in this public debate, an exhaustive study of various projection systems Tango performed nearly two years ago. With the help of Brian Cartwright (who did the huge majority of the legwork, we ought to note), Tango compared each system’s projected line for players over a four-year period to how they really did in each of those seasons. That’s how he evaluated the systems.

Full stop.

Projection systems, if this is how people insist upon rating them, should be junked. They are useless if used in that way. Projections are forecasts, carefully created using a number of variables and based on an established performance level, but they are not to be seen as absolute estimates of true talent or specific predictions. For that matter, you can’t view projections for all players the same way.

Projections are means, presented as they are for sake of space, simplicity and accessibility. Any analyst worth his salt can provide, upon request, a matrix of possible outcomes for every player they project. That’s what a true projection is. Baseball is wildly unpredictable. You have to embrace that.

Here’s what I mean. I happen to have the Bill James Handbook 2011 on my shelf, so I cracked it open and went looking for two players James’ system projected very similarly for 2011, at least in terms of OBP and slugging. I found, as it turned out, teammates: Starlin Castro (projected OBP/SLG .359/.428) and Kosuke Fukudome (.367/.431). Specifically, the book projected Castro for a .310 average, 39 doubles, eight triples and four homers, with 74 whiffs and 39 walks in 565 at bats. It projected Fukudome for a .263 average, 34 doubles, three triples and 14 homers, and 106 strikeouts against 78 walks in 490 at bats.

Only that’s not what it was saying. No reader should have come away thinking that James or his compatriots believed with conviction that Castro and Fukudome would do what is described above. Those figures are shorthand for probabilistic forecasts.

Castro’s line would naturally have MUCH more volatility. In his 565 at bats, the projection has him putting 487 balls in play. Fukudome, by this simple stated line, will put only 364 balls in play. It takes something like 900 at bats before the results of the balls a batter puts in play become a stable reflection of some real skill. Walks, homers and strikeouts are much more reliable data, skills that show up and prove real much more quickly. Therefore, Castro offers both more risk and more reward potential. This doesn’t show up in the simple projection, and too few readers grasp it, but it’s unequivocally true. If you draw Castro’s possibilities curve, with his total offensive value on the x-axis and the frequency of each of those values coming to fruition on the y-axis, his curve would be flat. He would be only marginally more likely to be near the stated average than to be much, much better or much, much worse than that. Fukudome’s curve would be steep, with little chance of collapse or breakout, because what he does is better established by reviewing past history. It’s also worth noting, of course, that because of Fukudome’s lack of power, his prospects for sustained success given his skill set were never great. That’s all before considering playing time/health, which is an important element itself and should be treated as the z-axis of that curve.

The same goes for pitchers. I tend to side with James in thinking that minor-league data offer more information than most analysts think they do, but I want to add the crucial caveat that is becoming too often overlooked in such discussions: If you’re projecting and predicting baseball, you’d better see the whole possibilities curve, not just the peak at the center, and certainly not the isolated point that jumps out of an equation designed to return that single number.

Matthew Trueblood