Question: From tan et al text Exercise 3.12. Learning objective is to show understanding of classifier performance analysis. Consider a labeled data set containing 100 data
From tan et al text Exercise 3.12. Learning objective is to show understanding of classifier performance analysis. Consider a labeled data set containing 100 data instances, which is randomly partitioned into two sets A and B, each containing 50 instances. We use A as the training set to learn two decision trees, T10 with 10 leaf nodes and T100 with 100 leaf nodes. The accuracies of the two decision trees on data sets A and B are show in the table below.
| Accuracy | ||
| Data Sets | T10 | T100 |
| A | 0.86 | 0.97 |
| B | 0.84 | 0.77 |
Based on the accuracies shown above, which classification model you expect to have better performance on unseen instances?
Now, you have tested T10 and T100 on the entire data set (A + B) and found the classification accuracy of T10 on the entire set (A + B) is 0.85, whereas the classification accuracy of T100 on the data set (A + B) is 0.87. Based this new information and your observations from the table, which classification model would you finally choose for classification?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
