These data include the engine size or displacement (in liters) and horse-power (HP) of 318 vehicles sold in the United States in 2011. Fit a multiple regression with the log10 of the combined mileage rating as the response and the log10 of the horsepower of the engine (HP), the log10 of the weight the car, and the log10 of the engine displacement as explanatory variables.
(a) Does it seem natural to find correlation among these explanatory variables, either on a log scale or in the original units?
(b) How will collinearity on the log scale affect the standard error of the slope of the log of displacement in the multiple regression?
(c) Describe the effects of collinearity on the three estimated coefficients. Which coefficients are most/least influenced by collinearity?
(d) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. To do this, we have to remove the effect of two of the explanatory variables from the other variables. Here’s how to make a so-called partial regression leverage plot for these data. First, regress Log10 MPG on Log10 HP and Log10 Weight and save the residuals. Second, regress Log10 Displacement on Log10 HP and Log10 Weight and save these residuals. Now, make a scatterplot of the residuals from the regression of Log10 MPG on the two explanatory variables on the residuals from the regression of Log10 Displacement on the other two explanatory variables. Fit the simple regression for this scatterplot, and compare the slope in this fit to the partial slope for Log10 Displacement in the multiple regression. Are they different?
(e) Compare the scatterplot of Log10 MPG on Log10 Displacement to the partial regression plot constructed in part (d). What has changed?