Question: Part 1 College data set is available in ISLR Library. Load the College data in the R environment by loading the ISLR library. Description of

Part 1

College data set is available in ISLR Library. Load the College data in the R environment by loading the ISLR library.

Description of College data set available at ISLR Library:

Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.

A data frame with 777 observations on the following 18 variables.

  • Private A factor with levels No and Yes indicating private or public university
  • Apps Number of applications received
  • Accept Number of applications accepted
  • Enroll Number of new students enrolled
  • Top10perc Pct. new students from top 10% of H.S. class
  • Top25perc Pct. new students from top 25% of H.S. class
  • F.Undergrad Number of fulltime undergraduates
  • P.Undergrad Number of parttime undergraduates
  • Outstate Out-of-state tuition
  • Room.Board Room and board costs
  • Books Estimated book costs
  • Personal Estimated personal spending
  • PhD Pct. of faculty with Ph.D.s
  • Terminal Pct. of faculty with terminal degree
  • S.F.Ratio Student/faculty ratio
  • perc.alumni Pct. alumni who donate
  • Expend Instructional expenditure per student
  • Grad.Rate Graduation rate

We will predict the number of applications received Apps using all other variables in the College data set and apply LASSO and Tree regression models and compare their performance (test MSE).

Part 2

Regression Tree

Predict the number of applications received Apps using all other variables in the College data set based on a Regression Tree:

Perform the following tasks: Use the training and test data set that you created in Part 1(a).

  1. Fit a Regression Tree to the training data, with Apps as the response and the all other variables as predictors. Use the summary() function to produce summary statistics about the tree. Note how many terminal nodes the tree has.
  2. Type in the name of the tree object in order to get a detailed text output.
  3. Create a plot of the tree. (Hint: use plot() and text() functions)
  4. Now use cross validation function cv.tree() to the training data set to see whether pruning the tree will improve performance (to determine the optimal tree size)
  5. Produce a plot with tree size on the x-axis and cross-validated classification error on the y-axis. (Hint: use the plot() function)
  6. Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation in parts (d) and (e). If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with eight terminal nodes.
  7. Compute the test error rates (test MSE) between the pruned and unpruned trees.
  8. Compare the above two test error rates in part (g) (pruned and unpruned trees) with the one obtained using LASSO regression (test MSE) in Part 1(d).

Note: Part 2h will require you to provide explanation. Provide your answer in the R Script at the end of the program. All other parts 2(a) - 2(g) do not require any explanation. Your grade will be based on the execution of the code.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!