Insurance companies base their premiums on many factors, but basically all the factors are variables that predict life expectancy. Life expectancy varies from place to place. Here’s a regression that models Life Expectancy in terms of other demographic variables. The variables are Murder rate per 100,000, HighSchool Graduation rate in %, Income per capita in dollars, Illiteracy rate per 1000, and Life Expectancy in years.
a) The state with the highest leverage and largest Cook’s Distance is Alaska. It is plotted with an x in the residuals plot. Here are a scatterplot of the residuals, a normal probability plot of the leverage values, and a histogram of Cook’s distance values. What evidence do you have from these diagnostic plots that Alaska might be an influential point?
Here’s another regression with a dummy variable for Alaska added to the regression model.
b) What does the coefficient for the dummy variable for Alaska mean? Is there evidence that Alaska is an outlier in this model?
c) Which model would you prefer for understanding or predicting Life Expectancy? Explain.

