Question: Consider a labeled data set containing 100 data instances which are randomly partitioned into two sets A and B, each containing 50 instances. We use

Consider a labeled data set containing 100 data instances which are randomly partitioned into two sets A and B, each containing 50 instances. We use A as the training set to learn two decision trees T10with 10 leaf nodes and T100 with 100 leaf nodes. The accuracies of the two decision trees on data sets A and B are shown below

Data set T10 T100

A 0.86 0.97

B 0.81 0.77

(a) Based on the accuracies shown in the table above, which classification model would you expect to have better performance on unseen instances?

(b) Now you've tested T10 and T100 on the entire dataset (A + B) and found that the classification accuracy of T10 on the data set (A + B) is 0.85, whereas the classification accuracy of T100 on the data set (A + B) is 0.87. Based on this new information and your observations from the table above, which classification model would you finally choose for classification?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Your question 1. 50 pts. From Tan et al. text Exercise 3.3. Learning objective understand informational entropy. Consider the table below for a binary classification problem. Instance A1 A2 A3 Target...

Consider a labeled data set containing 100 data instances, which is randomly partitioned into two sets A and B, each containing 50 instances. We use A as the training set to learn two decision trees,...

12. Consider a labeled data set containing 100 data instances, which is randomly partitioned into two sets A and B, each containing 50 instances. We use A as the training set to learn two decision...

Please use python to write this program. Part A. k Nearest Neighbor (kNN) Supervised Learner (40 points) Write a program that performs supervised classification using the kN N algorithm which assigns...

PLEASE READ ALL THE LOCKED COMMENTS CAREFULLY BEFORE SUBMITTING ANY ANSWERS. N IF YOU DO NOT READ THESE COMMENTS, YOU ARE LIKELY TO OVERLOOK AN IMPORTANT DETAIL W OF THE REQUIRED SUBMISSION STYLE. 4...

Data Processing Lab 1 My Solutions > Please reference the Data Processing Lab 1 file for a description of the deliverables. Note that your script must import the data from the original text file...

PSYC 354 HOMEWORK 6 Percentiles and Hypothesis Testing with Z-Tests When submitting this file, be sure the filename includes your full name, course and section. Example: HW6_JohnDoe_354B01 Be sure...

3.11 Exis 191 Table 3.7. Comparing the test c ry of decision trees Trand T1 Accuracy Data Set TOT 0.86 0.97 0.77 0.84 12. Consider a labeled data set containing 100 data instances, which is randomly...

Logan's basis in his partnership interest is $20,000 when he receives a pro rata non liquidating distribution from the partnership of $22,000 cash and inventory with a basis of $2,000 and fair market...

(A) What Is The Normal Melting Point? K The Normal Boiling Point? K (B) At What Temperature And Pressure Do The Three Phases Coexist In Equilibrium? T = K, P = Atm (C) Is Fluorine A Solid, A Liquid...

A company is most likely to gain from international diversification when:Economic conditions in the countries it invests in tend to move independentlyAll countries involved experience high economic...

Your company has exposure in multiple currencies. Following table provides you projections for the forthcoming financial year. Work out a comprehensive forex risk management policy for your company....