Question: Recall the dataset given in Homework 2 , that includes 1 4 features representing the clinical conditions of 5 0 0 ICU patients and the

Recall the dataset given in Homework 2, that includes 14 features representing the clinical conditions of 500 ICU patients and the target variable death representing whether the patient died (=1) in the ICU or discharged alive (=0). Using the same dataset, now try Regularized Logistic Regression (both L1 and L2 penalties and different C values), KNN classifier (different numbers of neighbors you believe to be reasonable), random forests (different numbers of trees and different numbers of features to select at each split of your selection) and gradient boosting classifier (different numbers of trees and learning rates of your selection). BE CAREFUL that the best model should be selected using cross validation hence you should never evaluate methods using the test set during the model selection. Also, be very careful that the standardization needs to be carefully done during cross validation not to end up with data snooping (recall the pipe approach discussed in the class).
Once you decide on the final method and the set of best parameters, refit your model on the standardized training set and evaluate the performance (accuracy) on the standardized test set. Also provide the test confusion matrix, as well as test ROC-AUC score of the best model.
 Recall the dataset given in Homework 2, that includes 14 features

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!