Question: Consider the following data description for the data given in CSV file HW 3 DataBO:: Input Variables: Hours Studied: The total number of hours spent
Consider the following data description for the data given in CSV file HWDataBO::
Input Variables:
Hours Studied: The total number of hours spent studying by each student.
Previous Scores: The scores obtained by students in previous tests.
Extracurricular Activities: Whether the student participates in extracurricular activities Yes or No
Sleep Hours: The average number of hours of sleep the student had per day.
Sample Question Papers Practiced: The number of sample question papers the student practiced.
Output Variable:
Performance Index: A measure of the overall performance of each student. The performance index represents the student's academic performance and has been rounded to the nearest integer. The index ranges from to with higher values indicating better performance.
Solve all the following questions using Python.
Use Pandas, Seaborn, Sklearn, etc. libraries for all the analysis.
B: points Read the data and display the first records of HWDataBO data. Identify the number of rows and columns. Does any column have missing data? Report any inconsistency? Display the statistical summaries of all the columns.
B: points Normalization. For each column in HWDataBO, apply the standard scaler, such that the mean is zero and standard deviation is one. Display the statistical summaries of all the columns.
B: points Cross Normalization. For each column in HWDataBN, apply the standard scaler fitted learned from HWDataBO data. Display the statistical summaries of all the columns in HWDataBN data.
B: points OLS Regression Formula The hypothesis is that the input variables are linearly related to Performance Index. Use the following formula to calculate the OLS coefficient estimates of all HWDataBO data.
B: points OLS Regression sklearn The hypothesis is that the input variables are linearly related to Performance Index. Do the following:
a Use the sklearn library to calculate the OLS coefficient estimates of all HWDataBO data.
b Compare the coefficients obtained in Part B with the above coefficients. Report any differences between the coefficients from Parts B and B
C Using the above OLS coefficient estimates, calculate the MSE for data given in HWDataBN.
B: points Ridge Regression. It may be possible that some input variables are not independent. Thus, the coefficients need regularization penalization Do the following:
a Do the Ridge analysis, taking all HWDataBO data as the training data. Use fold cross validation, and pick the best value of alpha from Print the best alpha and the resulting regression coefficients.
b Using the above coefficient estimates, calculate the MSE for data given in HWDataBN.
B: points LASSO Regression. It may be possible that not all input variables are helpful in predicting Performance Index. Thus, the coefficients need selection penalization Do the following:
a Do the LASSO analysis, taking all HWDataBO data as the training data. Use fold cross validation and pick the best value of alpha from Print the best alpha and the resulting regression coefficients.
b Using the above coefficient estimates, calculate the MSE for data given in HWDataBN.
B: points Regression Analysis. Compare and comment on the coefficients of the three models from Parts B B B Compare the performance of the OLS model against Ridge and LASSO models on the testing data.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
