Question: Consider the College data from the ISLR package. These are described on page 5 4 of the textbook. We would like to predict the number

Consider the College data from the ISLR package. These are described on page 54 of the textbook.
We would like to predict the number of applications received using the other variables.
Problem statements:
All data will be taken as training data.
a) Fit a linear model using least squares and report the LOOCV estimate of the test error.
b) Fit a tree to the data. Summarize the results. Unless the number of terminal nodes is large,
display the tree graphically and explicitly describe the regions corresponding to the terminal
nodes that provide a partition of the predictor space (i.e., provide expressions for the regions
R1
; ...; RJ
). Report its MSE.
c) Use LOOCV to determine whether pruning is helpful and determine the optimal size for the
pruned tree. Compare the pruned and un-pruned trees. Report MSE for the pruned tree. Which
predictors seem to be the most important?
d) Use a bagging approach to analyze the data with B =500 and B =1000. Compute the MSE.
Which predictors seem to be the most important?
e) Repeat (d) with a random forest approach with B =500 and B =1000, and m p =3.
f) Compare the results from the various methods. Which method would you recommend?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!