There has been a lot of hype about the new MLBAM StatCast system, a player-tracking/raw data machine. With all of this new data will come a need for more data analysis, and most likely, a better way to store and track data. I have manually compiled every piece of StatCast data currently available to the public through the various videos published on MLB.com, demonstrating some of the impressive capabilities of the new system.
The data was comprised from a few 2013-2014 regular season games, the 2014 All-Star Game, and the 2014 Playoffs. Below I have added links to downloadable spreadsheets demonstrating a few of the key fields that might be collected for each play in a major league baseball game using StatCast. The database that I created for this new StatCast data includes seven tables connected to the Lahman database, which I use to query players’ past statistics. Of those seven tables, four hold information that I predict will become the future talking points of not only front offices and statistical baseball writers, but the casual fan as well. The four tables holding all of the fancy new statistics are the Pitching, Batting, Fielding, and Running tables.
This StatCast database is meant to store every play within each game of a season using a play ID to connect plays from table to table. Using the player ID’s from the Lahman database seemed to me to be the easiest way to implement the new statistics, since it will be helpful in the future to query stats from both the Lahman files and the new StatCast files. This setup will also allow me to use counting and rate SQL formulas to easily understand a players season and career StatCast statistics.
As you look over the numbers, you will see some stars like Mike Trout, Andrew McCutchen, and Troy Tulowitzki. As I stated before, I was limited to the stats that have been released by MLB from 2013 through 2014, so the data on some of these players are incomplete or non-existent. This was more of a project about using the data we know can be tracked to create workable tables that can be fused with other different databases; in my case, I am morphing the new data with the Lahman baseball files. While we have little data to work with now, in the future I will be ready to incorporate lots of play-by-play StatCast stats into my database.
As you can see there are lots of null values. This is due to the incomplete information available for each play. In theory all of these fields would be filled if and when StatCast data becomes available to the public.
I suggest that you browse each spreadsheet to get a feel for the data…..
Batting – Download the full Batting table
Fielding – Download the full Fielding table
Pitching – Download the full Pitching table
Running – Download the full Running table
OK, now that you have played around with the spreadsheets, you might be thinking of unique ways to use these numbers to help evaluate players. Personally, I have an ongoing brainstorming journal that lists ways in which teams/management can use StatCast to test the overall performance of players. It might be a good idea for a future crowd sourcing post.
Just for fun, let’s see who ranks highest in some of these new statistical categories based on the micro amount of data we have:
Greatest Exit Velocity (off bat): Eric Hosmer, KC, 106.1 mph
Longest Fly Time: Juan Perez, SFN, 5.01 sec
Shortest Fly time: Kolten Wong, STL, 0.95 sec
Quickest Acceleration: Anthony Recker, NYN 4.27 ft/sec2
Greatest Max Speed: Billy Hamilton, CIN and Ruben Tejada, NYN, 23.3 mph
Highest Route Efficiency: Omar Quintanilla, NYN, 100%
Quickest Release: Tony Cruz, STL, 0.37 sec
Fastest Velocity: Andrew McCutchen, PIT, 78.8 mph
Quickest First Step: Travis d’Arnaud, NYN, -1.7 sec
Quickest First Step: Jhonny Peralta, STL -1.18 sec
Quickest Acceleration: Omar Infante, KC 9.99 ft/sec²
Greatest Max Speed: Jarrod Dyson, KC, 22.3 mph
Largest Lead Length: Pablo Sandoval, SFN, 17 ft
Largest Secondary Lead Length: Brandon Crawford, SFN, 21 ft
Longest Extension: Yusmiero Petit, SFN, 92 in
Highest Actual Velocity: Kevin Gausman, BAL, 99.6 mph
Highest Perceived Velocity: Kevin Gausman, BAL, 100.7 mph
Largest Difference between Perceived and Actual Velocity: Francisco Rodriguez, MIL, 2.9 mph
Greatest Spin Rate: Sergio Romo, SFN, 3002 rpm
These stats really don’t mean much since they’re only taken from a few plays, but imagine what we could come up with if we had every games’ stats. Also, think about how we could correlate some of this data with other metrics. How does a pitcher’s Spin Rate affect his Fly Ball or Ground Ball rate? How does a player’s Lead Length or First Step affect his Stolen Base percentage? Does a batter’s average Exit Velocity or Launch Angle have any correlation with his BABIP or OPS? No more just eyeballing whether a player is quick out of the box, or if he consistently takes a good route to the ball. This could also help quantify areas that players need to work on. A batter will now know if he needs to work on his acceleration out of the box, and a pitcher will know if his extension is causing him to throw more balls.
All of these things will be dealt with as soon as we get more data. I am trying to increase my “First Step” rate by creating an Access database to house the new data before it is available. By no means do I think I have hit the nail on the head with this first attempt to store the new stats, but I at least wanted to get the ball rolling.Next post: Remembering Ravishing Randy Johnson
Previous post: Almost Heroes: The Last Ten Franchises to Lose a World Series, Part 4 – Phillies