I am pretty sure by now we all know what Statcast is, have most likely seen it in use, and (for those of us who have been listening to the BttP Podcasts) are aware of its origins. Before the season, I compiled a database of all public MLB Statcast statistics and posted it here. Since then there has been a flurry of data released which warrants an update.

MLBAM is releasing a majority of its Statcast data the same way it released the PitchF/X data, through this pretty little website: http://gd2.mlb.com/components/game/mlb/. Most of this information on this site looks like mumbo-jumbo, but do not worry. There is another, more efficient and prettier website where you can download the data for free. It goes by the name BaseballSavant.com. Darren Willman, creator of the Savant network of websites, has tapped into the MLBAM website and has graciously made all of this data easily accessible to everyone.

I took my own stab at integrating the xml files from the MLBAM website into my own database. While I was able to successfully connect to the portal and download the info, I ultimately found it much easier to just download the stats from Willmans’s website.

The Statcast data released by MLBAM is still limited. For the most part, the only new numbers are the exit velocity and the occasional launch angle. The MLB also releases Statcast videos on their website, which include a few more statistics such as extra batted ball data and pitching information, as well as fielding and base running statistics. This is where I once again tediously scraped all of the data from these videos and coded them chronologically in my own database. Long story short–my database now includes four different main table clusters:

1. Regular season statistics for each MLB player (2015 only)

2. PitchF/X and Statcast hitting data released from MLBAM website

3. Statcast data scraped from the MLB.com videos (2013, 2014, 2015)

4. Extra pitching info table released to Mr. Ben Lindbergh by MLBAM for his article, Pitches in Radar Gun Are Slower Than They Appear: Identifying Baseball’s Perceived Velocity Kings

Download full database: Statcast Access Database

Download Statcast excel files scraped from videos:

Fielding Table

Base Running Table

Batting Table

Pitching Table

The preceeding information was to provide everyone with the new fully updated publicly available Statcast data. The following will be a brief attempt at using this data in a possibly useful manner.

As of now, the only Statcast data that contains any substance is the exit velocity due to its increased public release. The stats that I pulled from the MLB.com videos will be useful in a historic sense and fun to take a peek at, but in reality will not provide value due to sample biases. For example, a majority of the fielding data is taken from actions in which the fielder made a spectacular play. There is an absence of data from mediocre or poor plays. The same goes for the base running plays.

The amount of exit velocity numbers being generated every game is enough to start developing some hypotheses. It has been said by analysts from around the baseball world that exit velocity could be a sign of a potential or current power hitter. This seems logical to me, but I still wanted to take a deeper look at this notion in order to see if certain aspects of a player’s performance can be determined by his exit velocity speed.

To do this I first used my database to merge current season statistics for each player with their respective average exit velocity. No advanced stats were used; only standard stats. I ran some simple linear regressions to see which, if any, normal baseball stats could possibly be predicted by the hitter’s exit velocity. I only used players with at least 100 at-bats and 50 or more exit velocity counts.

It turns out that the stats most likely to be predicted by a player’s average exit velocity are indeed the power stats. The output showed that Home Runs had an R-squared value of .31 with acceptable P and significant F values. Slugging percentage had an R-squared of .29 and also had solid P and significant F values. In this post I only included the plot for slugging percentage. The correlation is not stellar, but the trend does show us that exit velocity accounts for at least a portion of the slugging totals.

EV~SLG~Scatter

https://public.tableau.com/views/SLGEVScatter/SLGEVScatter?:embed=y&:showTabs=y&:display_count=yes

From the regression output I set up the formula that attempts to predict slugging percentage and home runs from exit velocity. This was more of a fun exercise rather than an actual attempt to forecast those statistics. However, there are a few players that I feel confident saying have under-performed or over-performed their current slugging and home run marks and therefore might be candidates for regression.

The next chart will help us visualize some of those players actual stats compared to the stats the regression equation calculates. Once again, to not clutter the post I decided to only include the slugging percentage table. The twenty players listed are the top ten and bottom ten after I sorted the residuals.

EV~SLG

The top ten players in red represent those who might be over-performing, with regards to their slugging percentage, based on their current average exit velocity. The bottom ten players in green are those who could be due for improvement if they maintain their average exit velocity. Based on this result I find myself not as eager to jump on the exit velocity train just yet. From playing baseball I know there is a lot more that goes into getting hits and generating power than this one stat alone. I did not account for launch angle since there was very limited data, but this could also help explain why guys like Harper and Rizzo are on this list. Their exit velocities are well above average, but they are not yet rubbing shoulders with the league leaders in this category, and yet somehow they are still generating home run type power.

I should note that these numbers are a little off since I was using data as of June 27th. I am not concerned though because the main goal here was simply to give you an idea of how we might be able to use some of this data. If you think there are some other new Statcast statistics that prove to be relevant by all means utilize the database I included for download and start testing.

Sources: MLB.com, MLBAM XML, BaseballSavant.com

Stephen writes about Major League Baseball at BP Bronx and Banished To The Pen. He also informs readers about college baseball at the blog Underground Baseball. Follow Stephen on Twitter at @steve21shaw

Next post:
Previous post:

4 Responses to “Updated MLB Statcast Data (July 2015)”

  1. Brett

    Thanks for this. I am trying to do a school project. By any chance do you have the access data for the years since 2015?

    Reply

Trackbacks/Pingbacks

  1.  Major League Baseball’s Irregular Outfields : Sam Vickars

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.