Question: Consider the following data description for the data given in CSV file HW 3 DataBO:: Input Variables: Hours Studied: The total number of hours spent

Consider the following data description for the data given in CSV file HW3DataBO::
Input Variables:
Hours Studied: The total number of hours spent studying by each student.
Previous Scores: The scores obtained by students in previous tests.
Extracurricular Activities: Whether the student participates in extracurricular activities (Yes (1) or No (0)).
Sleep Hours: The average number of hours of sleep the student had per day.
Sample Question Papers Practiced: The number of sample question papers the student practiced.
Output Variable:
Performance Index: A measure of the overall performance of each student. The performance index represents the student's academic performance and has been rounded to the nearest integer. The index ranges from 10 to 100, with higher values indicating better performance.
Solve all the following questions using Python.
Use Pandas, Seaborn, Sklearn, etc. libraries for all the analysis.
B-1: (5 points) Read the data and display the first 10 records of HW3DataBO data. Identify the number of rows and columns. Does any column have missing data? Report any inconsistency? Display the statistical summaries of all the columns.
B-2: (5 points) Normalization. For each column in HW3DataBO, apply the standard scaler, such that the mean is zero and standard deviation is one. Display the statistical summaries of all the columns.
B-3: (5 points) Cross Normalization. For each column in HW3DataBN, apply the standard scaler fitted (learned) from HW3DataBO data. Display the statistical summaries of all the columns in HW3DataBN data.
B-4: (5 points) OLS Regression (Formula). The hypothesis is that the 5 input variables are linearly related to Performance Index. Use the following formula to calculate the OLS coefficient estimates of all HW5DataBO data. .
B-5: (6 points) OLS Regression (sklearn). The hypothesis is that the 5 input variables are linearly related to Performance Index. Do the following:
a. Use the sklearn library to calculate the OLS coefficient estimates of all HW5DataBO data.
b. Compare the coefficients obtained in Part B-4 with the above coefficients. Report any differences between the coefficients from Parts B-4 and B-5.
C. Using the above OLS coefficient estimates, calculate the MSE for data given in HW3DataBN.
B-6: (6 points) Ridge Regression. It may be possible that some input variables are not independent. Thus, the coefficients need regularization (penalization). Do the following:
a. Do the Ridge analysis, taking all HW3DataBO data as the training data. Use 8-fold cross validation, and pick the best value of alpha from 0.01,0.1,10,20,30. Print the best alpha and the resulting regression coefficients.
b. Using the above coefficient estimates, calculate the MSE for data given in HW3DataBN.
B-7: (6 points) LASSO Regression. It may be possible that not all input variables are helpful in predicting Performance Index. Thus, the coefficients need selection (penalization). Do the following:
a. Do the LASSO analysis, taking all HW3DataBO data as the training data. Use 8-fold cross validation and pick the best value of alpha from 0.1,1,10,100,1000. Print the best alpha and the resulting regression coefficients.
b. Using the above coefficient estimates, calculate the MSE for data given in HW3DataBN.
B-8: (4 points) Regression Analysis. Compare and comment on the coefficients of the three models from Parts B-5, B-6, B-7. Compare the performance of the OLS model against Ridge and LASSO models on the testing data.
Consider the following data description for the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!