The Effects of Spin Rate on a MLB Pitcher's Performance: A Case Study on the 2015 - 2023 SeasonsΒΆ

For my junior year AP Research class, I chose to do my research project on the correlation that a ball's spin rate has on a variety of different analytics for pitchers for the 2022 MLB season. While I was really proud the work I did on that project, I was also motivated to expand on what I had learned.

I had originally relied on manually importing StatCast stats from Baseball Savant, manipulating data in Excel, and using StatCrunch to report my findings. After completing an online course by the University of Michigan titled "Foundations of Sports Analytics: Data, Representation, and Models in Sports" I realized there was a 'better' way.

Using many of the coding principals from that course along with pybaseball, a Python package for baseball data analysis, I was able to streamline my analysis as well as make it much easiser to expand the analysis across multiple seasons.

SetUpΒΆ

The first thing we need to do is to import packages which allow us to collect,analyze, and visualize data. I've learned that these packages are pretty standard within the data analytics community.

InΒ [1]:
# normal imports
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# disable warnings
warnings.filterwarnings('ignore')

As I mentioned above, pybaseball is a really useful Python package. It allows you to import stats into a dataframe directly from popular sites, including Baseball Savant (which has the StatCast stats that I wanted to use for my analysis). Unfortunately, pybaseball didn't have a built-in function that grabbed all of the columns of data that I had previously used when importing things manually. However, by looking at how the existing pybaseball functions were defined, I was able to create a custom function that did exactly what I needed:

def statcast_pitcher_pitch_spin(year: int, minP: int = 100) -> pd.DataFrame:
    url = f"https://baseballsavant.mlb.com/leaderboard/custom?year={year}&type=pitcher&filter=&sort=4&sortDir=asc&min={minP}&selections=k_percent,bb_percent,p_era,batting_avg,exit_velocity_avg,whiff_percent,groundballs_percent,flyballs_percent,popups_percent,fastball_avg_spin,breaking_avg_spin,n_breaking_formatted,offspeed_avg_spin,n_offspeed_formatted&csv=true"
    res = requests.get(url, timeout=None).content
    data = pd.read_csv(io.StringIO(res.decode('utf-8')))
    data = sanitize_statcast_columns(data)
    return data

After doing that, we can import this new function and use it to create a dataframe with the stats that we want. The years are set from 2015 (the start of the StatCast Era) to 2023 (the current year as of publishing this), and the minimum amount of plate appearances against the pitcher is set to 100. The results of my custom function are listed on the columns below.

InΒ [2]:
# pybaseball
from pybaseball import statcast_pitcher_pitch_spin

data_spin_all = statcast_pitcher_pitch_spin('2023,2022,2021,2020,2019,2018,2017,2016,2015', minP=100)
print(data_spin_all.columns.tolist())
data_spin_all
['last_name', 'first_name', 'player_id', 'year', 'k_percent', 'bb_percent', 'p_era', 'batting_avg', 'exit_velocity_avg', 'whiff_percent', 'groundballs_percent', 'flyballs_percent', 'popups_percent', 'fastball_avg_spin', 'breaking_avg_spin', 'n_breaking_formatted', 'offspeed_avg_spin', 'n_offspeed_formatted']
Out[2]:
last_name first_name player_id year k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent fastball_avg_spin breaking_avg_spin n_breaking_formatted offspeed_avg_spin n_offspeed_formatted
0 Colon Bartolo 112526 2015 16.7 2.9 4.16 0.281 88.9 14.4 44.1 23.2 5.7 2161 2164.0 10.0 1727.0 7.4
1 Hawkins LaTroy 115629 2015 21.0 4.3 3.26 0.286 89.7 17.0 55.4 18.2 5.8 2051 2072.0 16.6 1698.0 8.0
2 Wolf Randy 150116 2015 17.4 9.3 6.23 0.319 89.0 16.2 46.2 17.9 5.1 2032 2176.0 40.2 1669.0 11.1
3 Marquis Jason 150302 2015 17.1 6.5 6.46 0.330 90.4 21.3 48.8 17.1 3.7 1782 1977.0 19.4 1239.0 21.2
4 Burnett A.J. 150359 2015 20.5 7.0 3.18 0.275 89.8 21.1 55.0 14.1 4.0 2009 2023.0 29.4 1678.0 8.8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4309 Woo Bryan 693433 2023 25.9 7.3 4.75 0.242 87.2 27.8 38.8 28.9 7.9 2179 2358.0 11.8 1713.0 3.6
4310 Elder Bryce 693821 2023 17.7 7.7 3.64 0.245 89.8 23.3 52.9 20.4 4.5 1999 2394.0 36.4 2031.0 12.3
4311 Pfaadt Brandon 694297 2023 20.5 6.6 6.91 0.303 90.4 24.6 34.1 30.1 8.5 2445 2667.0 32.9 1946.0 14.7
4312 Shuster Jared 694363 2023 13.0 11.4 5.00 0.250 89.8 21.3 35.2 29.0 12.4 2136 2204.0 34.2 1492.0 21.7
4313 Hartwig Grant 701643 2023 18.0 11.0 3.86 0.239 87.9 20.8 47.1 18.6 4.3 2155 2430.0 34.4 1792.0 4.8

4314 rows Γ— 18 columns

A huge part of this entire study is that I want to isolate both the fastball's average spin rate and also the secondary pitch's spin rate. However, there are three types of pitches listed on StatCast: fastballs, breaking balls, and offspeed. To simplify things, I wanted to group both breaking balls and offspeed together. When originally looking at them separately, I noticed a high amount of outliers where some pitchers threw a pitch labeled as "offspeed" with a lot more spin than typically seen by anyone else's pitches of that type. Because of this, I used a weighted average formula (breaking % * breaking spin + offspeed % * offspeed spin) to determine the exact average spin rate of a secondary pitch thrown by a pitcher, now labeled as pitch2_avg_spin.

InΒ [3]:
# massage data - change NaN values to zero, formula to calculate secondary pitch average spin based on pitch usage percentages
data_spin_all = data_spin_all.fillna(0)
data_spin_all["pitch2_avg_spin"] = (data_spin_all['n_breaking_formatted'] / (data_spin_all['n_breaking_formatted'] + data_spin_all['n_offspeed_formatted']) * data_spin_all['breaking_avg_spin']) + \
                                   (data_spin_all['n_offspeed_formatted'] / (data_spin_all['n_breaking_formatted'] + data_spin_all['n_offspeed_formatted']) * data_spin_all['offspeed_avg_spin'])

# change NaN to zero again (formula output to handle pitchers with no secondary pitches)
data_spin_all = data_spin_all.fillna(0)
data_spin_all["pitch2_avg_spin"] = data_spin_all["pitch2_avg_spin"].astype('int')
data_spin_all = data_spin_all.drop(columns=['player_id','breaking_avg_spin','offspeed_avg_spin','n_breaking_formatted','n_offspeed_formatted'])
data_spin_all
Out[3]:
last_name first_name year k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent fastball_avg_spin pitch2_avg_spin
0 Colon Bartolo 2015 16.7 2.9 4.16 0.281 88.9 14.4 44.1 23.2 5.7 2161 1978
1 Hawkins LaTroy 2015 21.0 4.3 3.26 0.286 89.7 17.0 55.4 18.2 5.8 2051 1950
2 Wolf Randy 2015 17.4 9.3 6.23 0.319 89.0 16.2 46.2 17.9 5.1 2032 2066
3 Marquis Jason 2015 17.1 6.5 6.46 0.330 90.4 21.3 48.8 17.1 3.7 1782 1591
4 Burnett A.J. 2015 20.5 7.0 3.18 0.275 89.8 21.1 55.0 14.1 4.0 2009 1943
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4309 Woo Bryan 2023 25.9 7.3 4.75 0.242 87.2 27.8 38.8 28.9 7.9 2179 2207
4310 Elder Bryce 2023 17.7 7.7 3.64 0.245 89.8 23.3 52.9 20.4 4.5 1999 2302
4311 Pfaadt Brandon 2023 20.5 6.6 6.91 0.303 90.4 24.6 34.1 30.1 8.5 2445 2444
4312 Shuster Jared 2023 13.0 11.4 5.00 0.250 89.8 21.3 35.2 29.0 12.4 2136 1927
4313 Hartwig Grant 2023 18.0 11.0 3.86 0.239 87.9 20.8 47.1 18.6 4.3 2155 2351

4314 rows Γ— 14 columns

Linear RegressionΒΆ

After attaining the formula, I then needed to run a linear regression test on the entire data-sheet to view the correlation between both Fastball Spin and Pitch 2 Spin and all of the other variables listed above (K%, BB%, etc.). When running a linear regression test, a lot of information is given, but in this case, the only thing that is needed to determine the correlation is the p-value. Therefore, I coded this section so that the output below would only show the necessary p-values, and no other non-essential information would be included. When the p-value is below 0.05, it means that there is a correlation between the two variables. As shown below, there are a vast number of these statistics that have p-values below 0.05, meaning that there is a correlation between a lot of these statistics and the average spin of both fastballs and secondary pitches.

InΒ [4]:
# calculate p-value per year

from scipy.stats import pearsonr

# method to use for 'corr' function to return p-value
# https://stackoverflow.com/questions/52741236/how-to-calculate-p-values-for-pairwise-correlation-of-columns-in-pandas
def pearsonr_pval(x,y):
    return pearsonr(x,y)[1]

# change settings for prettier output of p-value correlations
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# list of years to cycle through for finding specific year p-values
years = [2023,2022,2021,2020,2019,2018,2017,2016,2015]

# all years
with pd.option_context('display.float_format', '{:0.6f}'.format):
    data_spin_pv = data_spin_all.drop(columns=['last_name', 'first_name', 'year'])
    corr = data_spin_pv.corr(method=pearsonr_pval,numeric_only=True)
    print("All Years\n{}\n".format(corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]))

# individual years
for y in years:
    with pd.option_context('display.float_format', '{:0.6f}'.format):
        data_spin_year = data_spin_all.loc[data_spin_all['year'] == y]
        data_spin_pv = data_spin_year.drop(columns=['last_name', 'first_name', 'year'])
        corr = data_spin_pv.corr(method=pearsonr_pval,numeric_only=True)
        corr2 = corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]
        print("\n{}: P-Values\n{}\n".format(y,corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]))
        # just print the statistically significant correlations (< 0.05)
        corr2 = corr2[corr2 < .05].unstack().transpose()\
            .sort_values( ascending=True).dropna()
        print(corr2)
        # try out printing a p-value heatmap to visualize correlations; decided not to use this :) 
        # plt.figure(figsize=(6,4))
        # sns.set(font_scale=.8)
        # sns.heatmap(corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])].transpose(),annot=True, cmap="YlGnBu",annot_kws={"size": 8},fmt='.4f')
        # plt.title('P-Value Heatmap - {}'.format(y))
All Years
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.000000 0.000000     0.000000           0.010929       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000000    0.000000 0.000000     0.000000           0.001198       0.000000             0.100976          0.000138        0.003119


2023: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.018736 0.010116     0.000000           0.030626       0.000000             0.000002          0.000003        0.000000
pitch2_avg_spin     0.013360    0.327086 0.085382     0.014754           0.034736       0.184838             0.983930          0.978393        0.262291

whiff_percent        fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000002
flyballs_percent     fastball_avg_spin   0.000003
p_era                fastball_avg_spin   0.010116
k_percent            pitch2_avg_spin     0.013360
batting_avg          pitch2_avg_spin     0.014754
bb_percent           fastball_avg_spin   0.018736
exit_velocity_avg    fastball_avg_spin   0.030626
                     pitch2_avg_spin     0.034736
dtype: float64

2022: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.042327 0.000007     0.000000           0.320380       0.000000             0.000000          0.000001        0.000000
pitch2_avg_spin     0.000007    0.005356 0.001465     0.000002           0.041623       0.000210             0.508716          0.892649        0.022412

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000001
batting_avg          pitch2_avg_spin     0.000002
p_era                fastball_avg_spin   0.000007
k_percent            pitch2_avg_spin     0.000007
whiff_percent        pitch2_avg_spin     0.000210
p_era                pitch2_avg_spin     0.001465
bb_percent           pitch2_avg_spin     0.005356
popups_percent       pitch2_avg_spin     0.022412
exit_velocity_avg    pitch2_avg_spin     0.041623
bb_percent           fastball_avg_spin   0.042327
dtype: float64

2021: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.000226 0.000235     0.000000           0.234767       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000000    0.001387 0.094188     0.000000           0.032148       0.000000             0.207303          0.652753        0.001030

whiff_percent        fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000000
groundballs_percent  fastball_avg_spin   0.000000
whiff_percent        pitch2_avg_spin     0.000000
flyballs_percent     fastball_avg_spin   0.000000
batting_avg          pitch2_avg_spin     0.000000
bb_percent           fastball_avg_spin   0.000226
p_era                fastball_avg_spin   0.000235
popups_percent       pitch2_avg_spin     0.001030
bb_percent           pitch2_avg_spin     0.001387
exit_velocity_avg    pitch2_avg_spin     0.032148
dtype: float64

2020: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.047310 0.010269     0.000000           0.266034       0.000000             0.000007          0.000281        0.000073
pitch2_avg_spin     0.000148    0.356114 0.000515     0.000172           0.285386       0.071479             0.310271          0.046058        0.542162

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000007
popups_percent       fastball_avg_spin   0.000073
k_percent            pitch2_avg_spin     0.000148
batting_avg          pitch2_avg_spin     0.000172
flyballs_percent     fastball_avg_spin   0.000281
p_era                pitch2_avg_spin     0.000515
                     fastball_avg_spin   0.010269
flyballs_percent     pitch2_avg_spin     0.046058
bb_percent           fastball_avg_spin   0.047310
dtype: float64

2019: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.016385 0.035657     0.000001           0.005820       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000000    0.001347 0.130905     0.002535           0.242583       0.000465             0.146435          0.061935        0.257985

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000000
batting_avg          fastball_avg_spin   0.000001
whiff_percent        pitch2_avg_spin     0.000465
bb_percent           pitch2_avg_spin     0.001347
batting_avg          pitch2_avg_spin     0.002535
exit_velocity_avg    fastball_avg_spin   0.005820
bb_percent           fastball_avg_spin   0.016385
p_era                fastball_avg_spin   0.035657
dtype: float64

2018: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.131973 0.002000     0.000000           0.103212       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000073    0.236726 0.115592     0.003820           0.337130       0.001180             0.461986          0.309744        0.278123

whiff_percent        fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000073
whiff_percent        pitch2_avg_spin     0.001180
p_era                fastball_avg_spin   0.002000
batting_avg          pitch2_avg_spin     0.003820
dtype: float64

2017: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.017546 0.000023     0.000000           0.664940       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000035    0.112367 0.000112     0.000114           0.026909       0.001338             0.583682          0.547332        0.716795

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
p_era                fastball_avg_spin   0.000023
k_percent            pitch2_avg_spin     0.000035
p_era                pitch2_avg_spin     0.000112
batting_avg          pitch2_avg_spin     0.000114
whiff_percent        pitch2_avg_spin     0.001338
bb_percent           fastball_avg_spin   0.017546
exit_velocity_avg    pitch2_avg_spin     0.026909
dtype: float64

2016: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.008719 0.010021     0.000000           0.108696       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000001    0.292678 0.007418     0.000043           0.065793       0.000203             0.002991          0.156506        0.107725

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000001
batting_avg          pitch2_avg_spin     0.000043
whiff_percent        pitch2_avg_spin     0.000203
groundballs_percent  pitch2_avg_spin     0.002991
p_era                pitch2_avg_spin     0.007418
bb_percent           fastball_avg_spin   0.008719
p_era                fastball_avg_spin   0.010021
dtype: float64

2015: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.415631 0.000970     0.000000           0.016411       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.007602    0.225946 0.015076     0.004982           0.068573       0.454784             0.393288          0.266614        0.589300

popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
p_era                fastball_avg_spin   0.000970
batting_avg          pitch2_avg_spin     0.004982
k_percent            pitch2_avg_spin     0.007602
p_era                pitch2_avg_spin     0.015076
exit_velocity_avg    fastball_avg_spin   0.016411
dtype: float64

Pretty PicturesΒΆ

After obtaining the p-values, I then wanted to graph these relationships on scatterplots. I overlayed each year ontop of each other using the "hue" function, and then graphed each relationship using the pitch's spin as the x-value, and the differing statistics as the y-value. After that, I wanted to create individual, year-by-year graphs for the statistics I deem to be most important in there being a correlation. This is all layed out below.

InΒ [5]:
custom_palette=sns.color_palette("Paired",9)
sns.set_theme(style="white",palette=custom_palette)
plt.rc('legend',fontsize=25, title_fontsize=25,markerscale=5.0)
pp = sns.pairplot(data=data_spin_all,y_vars=["k_percent", "bb_percent", "p_era", "batting_avg","exit_velocity_avg", "whiff_percent", "groundballs_percent", "flyballs_percent", "popups_percent"],\
                  x_vars=["fastball_avg_spin", "pitch2_avg_spin"], kind='reg',markers='.',hue='year',height=2, aspect=2)
pp = pp.map(plt.scatter)
xlabels,ylabels = [],[]

#handles = pp._legend_data.values()
#labels = pp._legend_data.keys()
#pp.fig.legend(handles=handles, labels=labels)


for ax in pp.axes[-1,:]:
    xlabel = ax.xaxis.get_label_text()
    xlabels.append(xlabel)
for ax in pp.axes[:,0]:
    ylabel = ax.yaxis.get_label_text()
    ylabels.append(ylabel)

for i in range(len(xlabels)):
    for j in range(len(ylabels)):
        pp.axes[j,i].xaxis.set_label_text(xlabels[i],visible=True)
        pp.axes[j,i].yaxis.set_label_text(ylabels[j],visible=True)

for ax in pp.axes.flat:
    ax.tick_params(axis='both', labelleft=True, labelbottom=True)

pp.fig.subplots_adjust(top=.95)
pp.fig.suptitle("All Years, All Stats")
plt.subplots_adjust(wspace=0.3, hspace=0.9)
plt.show()
No description has been provided for this image
InΒ [6]:
# drill down into a relationship across multiple years; this one is 'fastball_avg_spin' & k_percent'
lm=sns.lmplot(x='fastball_avg_spin', y='k_percent', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & K%", fontsize=14)
Out[6]:
Text(0.5, 0.98, 'Fastball Spin & K%')
No description has been provided for this image
InΒ [7]:
# 'pitch2_avg_spin' & 'k_percent'
lm = sns.lmplot(x='pitch2_avg_spin', y='k_percent', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & K%", fontsize=14)
Out[7]:
Text(0.5, 0.98, 'Pitch-2 Spin & K%')
No description has been provided for this image
InΒ [8]:
# 'fastball_avg_spin' & 'p_era'
lm = sns.lmplot(x='fastball_avg_spin', y='p_era', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & ERA", fontsize=14)
Out[8]:
Text(0.5, 0.98, 'Fastball Spin & ERA')
No description has been provided for this image
InΒ [9]:
# 'pitch2_avg_spin' & 'p_era'
lm = sns.lmplot(x='pitch2_avg_spin', y='p_era', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & ERA", fontsize=14)
Out[9]:
Text(0.5, 0.98, 'Pitch-2 Spin & ERA')
No description has been provided for this image
InΒ [10]:
# 'fastball_avg_spin' & 'batting_avg'
lm = sns.lmplot(x='fastball_avg_spin', y='batting_avg', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & Batting Average Against", fontsize=14)
Out[10]:
Text(0.5, 0.98, 'Fastball Spin & Batting Average Against')
No description has been provided for this image
InΒ [11]:
# 'pitch2_avg_spin' & 'batting_avg'
lm = sns.lmplot(x='pitch2_avg_spin', y='batting_avg', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & Batting Average Against", fontsize=14)
Out[11]:
Text(0.5, 0.98, 'Pitch-2 Spin & Batting Average Against')
No description has been provided for this image
InΒ [12]:
# 'fastball_avg_spin' & 'whiff_percent'
lm = sns.lmplot(x='fastball_avg_spin', y='whiff_percent', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & Whiff %", fontsize=14)
Out[12]:
Text(0.5, 0.98, 'Fastball Spin & Whiff %')
No description has been provided for this image
InΒ [13]:
# 'pitch2_avg_spin' & 'whiff_percent'
lm = sns.lmplot(x='pitch2_avg_spin', y='whiff_percent', data=data_spin_all, col='year',col_wrap=3,height=2.5,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & Whiff %", fontsize=14)
Out[13]:
Text(0.5, 0.98, 'Pitch-2 Spin & Whiff %')
No description has been provided for this image

ConclusionΒΆ

In conclusion, based on the p-values and subsequent graphs throughout the years 2015-2023, there is a correlation between the change in both pitch type's spin and the change in opponent batting average, K%, BB%, ERA, Whiff Rate, flyball rate, and popup rate. With groundball rate, there is only a correlation with the fastball's average spin, and not Pitch 2. There is a gap here, however - the data that's available isn't as granular as I would like. Not much data exists for correlating spin rate and pitch-by-pitch analysis. The data shown here isn't linked to the pitch type, but instead linked to the pitcher. Moreover, the concept of spin rate is very new and will continue to develop and deepen over time, which will certainly help improve these findings. While there are limitations here, the correlation still is valid and shows that there is a connection between the change in spin rate and the performance of a pitcher.