Question: Part 1 (100 points) College data set is available in ISLR Library. Load the College data in the R environment by loading the ISLR library.
Part 1(100 points)
College data set is available in ISLR Library. Load the College data in the R environment by loading the ISLR library.
Description of College data set available at ISLR Library:
Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
A data frame with 777 observations on the following 18 variables.
Private A factor with levels No and Yes indicating private or public university
Apps Number of applications received
Accept Number of applications accepted
Enroll Number of new students enrolled
Top10perc Pct. new students from top 10% of H.S. class
Top25perc Pct. new students from top 25% of H.S. class
F.Undergrad Number of fulltime undergraduates
P.Undergrad Number of parttime undergraduates
Outstate Out-of-state tuition
Room.Board Room and board costs
Books Estimated book costs
Personal Estimated personal spending
PhD Pct. of faculty with Ph.D.'s
Terminal Pct. of faculty with terminal degree
S.F.Ratio Student/faculty ratio
perc.alumni Pct. alumni who donate
Expend Instructional expenditure per student
Grad.Rate Graduation rate
We will predict the number of applications received Apps using all other variables in the College data set and apply LASSO model.
PERFORM LASSO MODEL:
Predict the number of applications received Apps using all other variables in the College data set using LASSO model for variable selection:
a.Split the data set randomly into training and test data set. (10 points)
b.Fit Lasso model using glmnet() function on the training data set.(10 points)
c.Perform cross-validation on the training data set to choose the best lambda.(10 points)
d.Estimate the predicted values using the best lambda obtained in part (c) on the test data (using the predict() function) and compute test MSE. (20 points)
e.Compare the Lasso predicted test MSE with the null model (lambda=infinity) test MSE and least square regression model (lambda=0) test MSE. Provide a brief discussion on the comparison of the three test MSE obtained. (20 points)
f.Now construct the Lasso model for the entire data set and obtain the Lasso coefficients using the best lambda obtained in part (c) and report the number of non-zero coefficient estimates.(15 points)
g.Now use the Lasso predictors obtained in part (f) to fit the Linear Regression Model and report the summary of the linear model. (15 points)
Hint: You can refer to the program for "P2_LASSO_HittersData_OVERVIEW" as a guideline for the assignment.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
