This task requires estimating several models to identify the best model to predict default based on the
Question:
This task requires estimating several models to identify the best model to predict default based on the available data. Create a new variable from trainData called "y" which takes the value = 1 if the column "loan status" has the value "Charged Off" and 0 otherwise. All other variables provided to you other than the loan status are features or "predictors". Consider whether you would like to transform your variables; for example, consider converting some of the categorical variables into a continuous variable. 1.1 Linear Regression Model Fit a linear regression model to the trainData, with y as the outcome variable, with all the predictors. (a) What is the Mean Squared Error for the training data? (b) What is the Mean Squared Error for the testing data? 1.2 Ridge Regression Model Fit a ridge regression model to the trainData, with y as the outcome variable, with the predictors. Explore all values of hyperparameter (lambda) ranging from 0.01 to 100 with an increment of 0.01. (a) What is the Mean Squared Error for the "best" model of this class for the training data? (b) What is the Mean Squared Error for the "best" model of this class for the test data? 1.3 Lasso Regresion Model Fit a LASSO to the trainData, with y as the outcome variable, with the predictors. Explore the same values of hyperparameter (lambda) as for the ridge regression. (a) What is the Mean Squared Error for the "best" model of this class for the training data? (b) What is the Mean Squared Error for the "best" model of this class for the test data? 1.4 Random Forest Fit a randomForest to the trainData, with y as the outcome variable, with the predictors. Explore and fit the best model of this class. Please explain any estimation assumptions you take. (a) What is the Mean Squared Error for the "best" model of this class for the training data? (b) What is the Mean Squared Error for the "best" model of this class for the test data? (c) How important are the variables in predicting default? 1.5 Neural Network Select a Neural Network model and follow the same process as in the previous models. (a) What is the Accuracy for the model for the training data? (b) What is the Accuracy for the model for the testing data? (c) Explain why you chose the particular Neural Network model 1.6 Evaluation Compare and contrast the predictive power of all approaches and identify the best model to predict default from the given data. Task 2 (50%) You are required to combine all the work done for Task 1 and submit a report for predicting default for borrowers from the data platform. Discuss also how the variables are correlated with the "loan status". Identify the 10 most correlated and the 10 least correlated variables. Exploit all the information you generated for Task 1 for a report present your best model. Pay attention to explaining why it is the best model and present the best model's performance compared to the other models on hand.