Question: Import the data set using pd . read _ excel and report the shape ( rows , columns ) of data. Remove the variables INSTNM
Import the data set using pdreadexcel and report the shape rows columns of data.
Remove the variables INSTNM and Appl from the dataset. Split the data set into a training set and a validation set and use randomstate Keep the variable UNITID but it is just an ID so should NOT be used in any analysis or modeling.
Provide the descriptive statistics of the variable Ugenter, the total number of entering students at undergraduate level.
If we are to predict the variable Ugenter using other variables with linear regression, provide at least two appropriate EDAs prior to modeling.
Drop missing values using dropna function. Report the size of the remaining training and validation data sets.
Fit a linear regression model on the training set to predict Ugenter. Display the model's coefficients and the Mean Squared Error MSE of the validation set.
Perform fold CV and show the CV error, ie the avergae MSE.
Perform LOOCV and show the CV error.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
