Question: 10. UScereal is a built-in data frame in Rs package MASS with n = 65 rows on 11 variables. Three of the variables (mfr for
10. UScereal is a built-in data frame in R’s package MASS with n = 65 rows on 11 variables. Three of the variables (“mfr” for manufacturer, “shelf” for the display shelf with three categories counting from the floor, and
“vitamins”) are categorical, and the others are quantitative.
Type library(MASS); ?UScereal for a full description.
In this exercise we will apply variable selection methods to determine the best model for predicting “calories,” the number of calories in one portion, using the seven quantitative predictors. Use uscer=UScereal[, -c(1, 9, 11)] to generate the data frame uscer without the three categorical variables.
(a) Perform criterion-based variable selection with each of the criteria Cp, adjusted R2, and BIC, and give the model selected by each criterion. (Hint. Use library(leaps); vs.out=regsubsets(calories∼ . , nbest=3, data=uscer), and then use (12.4.15) and (12.4.16) to create plots like that in Figure 12-10 in order to select the best model according to each criterion.)
(b) Use Cook’s D to determine if there are any influential observations. (Hint. Use cer.out=lm(calories∼ . , data=uscer); plot(cer.out, which =4) to construct a plot like Figure 12-12.)
(c) Create a new data frame, usc, by removing the influential observations; perform criterion-based variable selection with each of the criteria Cp, adjusted R2 and BIC; and give the model selected by each criterion.
Does the final model seem reasonable? (Hint. See
(12.4.22).)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
