Question: You will use classification algorithms to perform a binary classification for disease prediction. The dataset includes medical records of 1 0 0 0 0 patients,

You will use classification algorithms to perform a binary classification for disease prediction. The dataset includes medical records of 10000 patients, in which 9000 records are labeled as negative and 1000 records are labeled as positive. Each medical record consists of 20 features corresponding to medical examination items. (d) Considering Decision Tree, Random Forrest, SVM and nave Bayes classification algorithms, which classification algorithm do you prefer to use? Explain why the nave Bayes algorithm is not preferable. (e) The given dataset is imbalanced. You need to consider a sampling method in order to improve the classification performance. Describe the sampling method you prefer to use and explain why. (f) This is an imbalanced classification problem. The accuracy may not be the best metric for this problem. Explain why. What other evaluation metrics do you prefer to consider? Give the formulas of these metrics. (g) It is possible for two classification models to give the testing results with the same accuracy, but one model gives better disease prediction than the other. Create one confusion matrices with specific values to illustrate this case. (h) Describe how to solve this classification problem using an ensemble method in detail. Compare bagging and boosting methods. (i) Describe how to use cross-validation method to improve the performance.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!