Question: In this exercise using RStudio, generate simulated data, and will then use this data to perform best subset selection. Use the following code to generate

In this exercise using RStudio, generate simulated data, and will then use this data to perform best subset selection.

  1. Use the following code to generate a predictor X of length n = 100, as well as a noise vector ? of length n=100.

set.seed(1)

x <- rnorm(100)

eps <- rnorm(100)

2. Generate a response vector Y of length n=100 according to the model:

Y=?0+?1X1+?2X2+?3X3+?

Where ?0, ?1, ?2, ?3 are constants of your choice.

Sample code (replace the b0, b1, b2, b3 values of your choice):

b0 <- 2

b1 <- 3

b2 <- -1

b3 <- 0.5

y <- b0 + b1 * x + b2 * x^2 + b3 * x^3 + eps

3. Use the regsubsets() function to perform best subset selection in order to choose the best model containing the predictors X,X2,?,X10. What is the best model obtained according to Cp, BIC, and adjusted R2? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained. Note you will need to use the data.frame() function to create a single data set containing both X and Y (sample code is provided below).

install.packages("leaps")

library(leaps)

data.full <- data.frame(y = y, x = x)

regfit.full <- regsubsets(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) + I(x^6) + I(x^7) + I(x^8) + I(x^9) + I(x^10), data = data.full, nvmax = 10)

reg.summary <- summary(regfit.full)

par(mfrow = c(2, 2))

plot(reg.summary$cp, xlab = "Number of variables", ylab = "C_p", type = "l")

points(which.min(reg.summary$cp), reg.summary$cp[which.min(reg.summary$cp)], col = "red", cex = 2, pch = 20)

plot(reg.summary$bic, xlab = "Number of variables", ylab = "BIC", type = "l")

points(which.min(reg.summary$bic), reg.summary$bic[which.min(reg.summary$bic)], col = "red", cex = 2, pch = 20)

plot(reg.summary$adjr2, xlab = "Number of variables", ylab = "Adjusted R^2", type = "l")

points(which.max(reg.summary$adjr2), reg.summary$adjr2[which.max(reg.summary$adjr2)], col = "red", cex = 2, pch = 20)

coef(regfit.full, which.max(reg.summary$adjr2))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Computer Network Questions!