The general manager of a major league baseball team would like to develop a regression model to predict the number of wins during the season by a starting pitcher. The Excel file MLB pitchers.xlsx provides the following data on a random sample of starting pitchers from a recent season:
• Average walks and hits per innings pitched (WHIP)
• Average strikeouts per nine innings (K/ 9)
• Average strikeout to walk ratio (K/ BB)
• Earned run average (ERA)— the average number of earned runs given up per game
• Average pitches per plate appearance (P/ PA)
• Average pitches per inning (P/ IP)
• The ground ball to fly ball ratio (G/ F)— pitchers who have higher G/ F ratios tend to cause batters to hit the ball on the ground rather than the air
• Run support average (RS)— the average number of runs scored by the pitcher’s team per start
• Right handed or left handed pitcher (R/ L)
a. Check for the presence of multicollinearity between the independent variables. If it is present, take the necessary steps to eliminate it.
b. Construct a regression model using a best subsets regression that predicts the average number of wins for a pitcher using the independent variables from part a.
c. Interpret the meaning of the regression coefficients from part b.
d. Construct a 99% confidence interval for the regression coefficients for the run support variable from part b. Be sure to interpret the meaning of this confidence interval.
e. Predict the average number of wins for a left handed pitcher who averages 1.2 walks and hits per inning, 7.1 strikeouts per game, 3.8 pitches per plate appearances, 15.2 pitches per inning, a ground ball to fly ball ratio of 0.8, a strikeout to walk ratio of 2.5, and an earned run average of 3.6 runs per game and whose team averages 5.3 runs per game during his starts.
f. Perform a residual analysis to verify that the conditions for the regression model are met for the model developed in part b.
g. The general manager would like to add a new starting pitcher to his team’s roster. Using the results of this model, should he pursue a pitcher that has a high strikeout to walk ratio or a high ground ball to fly ball ratio? Explain your choice.