Question: Import the data set using pd . read _ excel and report the shape ( rows , columns ) of data. Remove the variables INSTNM

Import the data set using pd.read_excel and report the shape (rows, columns) of data.
Remove the variables INSTNM and Appl from the dataset. Split the data set into a training set (60%) and a validation set (40%) and use random_state=5555. Keep the variable UNITID but it is just an ID so should NOT be used in any analysis or modeling.
Provide the descriptive statistics of the variable Ug_enter, the total number of entering students at undergraduate level.
If we are to predict the variable Ug_enter using other variables with linear regression, provide at least two appropriate EDAs prior to modeling.
Drop missing values using .dropna() function. Report the size of the remaining training and validation data sets.
Fit a linear regression model on the training set to predict Ug_enter. Display the model's coefficients and the Mean Squared Error (MSE) of the validation set.
Perform 5-fold CV and show the CV error, i.e., the avergae MSE.
Perform LOOCV and show the CV error.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!