Consider the decision trees shown in Figures (a) and (b). For each approach described below, you...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training and validation data sets are shown in Figures (c) and (d), respectively. + 0 B A A 0 C B 1 0 1 0 + + B 0 + (a) Decision Tree A (b) Decision Tree B #123 0 3 OOOD A 0 B00 C Class 0 0 + 11 0 0 + 12 0 1 0 - 13 4567 0 1 1 14 1 0 0 + 15 #12345 AO A 0 BO B 0 00 C Class 0 + 0 0 1 - 0 1 0 - 0 1 1 - 0 1 1 + 1 0 1 + 16 1 0 0 - 1 0 1 - 17 1 0 0 8 1 1 0 - 18 1 0 1 - 90 1 1 0 - 19 1 1 0 - 10 1 1 1 + 20 1 1 1 + (c) Training data (c) Validation data (a) Optimistic approach (assumes generalization error is given by the training error). (b) Reduced error pruning approach (generalization error is computed using the validation set shown in Figure (d)), which means generalization error = validation error. (c) Pessimistic error approach, where generalization error = training error + omega * complexity of the tree. Consider omega = 1.0 and complexity of the tree = number of leaves / total number of training points. (d) minimum description length (MDL) approach. The total description length of a tree is given by: Cost(tree, data)=Cost(tree) + Cost(data tree); Each internal node of the tree is encoded by the ID of the splitting attribute. If there are m attributes, the cost of encoding each attribute is log2m bits. Each leaf node is encoded using the ID of the class it is associated with. If there are k classes, the cost of encoding a class is log2 k bits. Cost(tree) is the cost of encoding all the nodes in the tree. To simplify the computation, you can assume that the total cost of the tree is obtained by adding up the costs of encoding each internal node and each leaf node. Cost(data tree) is encoded using the classification errors the tree commits on the training set. Each error is encoded by log2 n bits, where n is the total number of training examples. Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training and validation data sets are shown in Figures (c) and (d), respectively. + 0 B A A 0 C B 1 0 1 0 + + B 0 + (a) Decision Tree A (b) Decision Tree B #123 0 3 OOOD A 0 B00 C Class 0 0 + 11 0 0 + 12 0 1 0 - 13 4567 0 1 1 14 1 0 0 + 15 #12345 AO A 0 BO B 0 00 C Class 0 + 0 0 1 - 0 1 0 - 0 1 1 - 0 1 1 + 1 0 1 + 16 1 0 0 - 1 0 1 - 17 1 0 0 8 1 1 0 - 18 1 0 1 - 90 1 1 0 - 19 1 1 0 - 10 1 1 1 + 20 1 1 1 + (c) Training data (c) Validation data (a) Optimistic approach (assumes generalization error is given by the training error). (b) Reduced error pruning approach (generalization error is computed using the validation set shown in Figure (d)), which means generalization error = validation error. (c) Pessimistic error approach, where generalization error = training error + omega * complexity of the tree. Consider omega = 1.0 and complexity of the tree = number of leaves / total number of training points. (d) minimum description length (MDL) approach. The total description length of a tree is given by: Cost(tree, data)=Cost(tree) + Cost(data tree); Each internal node of the tree is encoded by the ID of the splitting attribute. If there are m attributes, the cost of encoding each attribute is log2m bits. Each leaf node is encoded using the ID of the class it is associated with. If there are k classes, the cost of encoding a class is log2 k bits. Cost(tree) is the cost of encoding all the nodes in the tree. To simplify the computation, you can assume that the total cost of the tree is obtained by adding up the costs of encoding each internal node and each leaf node. Cost(data tree) is encoded using the classification errors the tree commits on the training set. Each error is encoded by log2 n bits, where n is the total number of training examples.
Expert Answer:
Answer rating: 100% (QA)
To compute the generalization errors for both Decision Tree A and Decision Tree B and decide which tree is better we will apply various approaches as described starting with the optimistic approach an... View the full answer
Related Book For
Introduction to Data Mining
ISBN: 978-0321321367
1st edition
Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar
Posted Date:
Students also viewed these algorithms questions
-
Consider the decision trees shown in Figure 4.3. Assume they are generated from a data set that contains 16 binary attributes and 3 classes, C1, C2, and C3. Compute the total description length of...
-
Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...
-
State whether or not each of the following events would result in a liability being recognised in the accounts at 30 June. 1. Taxes for the year ended 30 June, which are not payable until October. 2....
-
The December 31, 2015, balance sheet of Schism, Inc., showed $130,000 in the common stock account and $2,332,000 in the additional paid-in surplus account. The December 31, 2016, balance sheet showed...
-
Krystal is a U.S.-based company which manufactures, sells, and installs water purification equipment. On April 11th the company sold a system to the City of Nagasaki, Japan, for installation in...
-
What types of remedies an aggrieved party may seek from the court?
-
Sues Gallery sells original paintings by local artists. All sales occur in the store. Sometimes customers purchase more than one painting. Individual customers must pay for purchases in full at the...
-
How does the dynamic interplay between transformational and transactional leadership styles influence organizational adaptability in complex business environments ?
-
Your client, Mr. X began a self-employed, unincorporated coffee business 123 Ltd. Mr. X would like some assistance in preparing his 2020 tax return. You have been provided with the following...
-
In an RLC series circuit driven by an AC source, the amplitude of the potential difference across the inductor is \(29.0 \mathrm{~V}\) and that across the capacitor is \(13.0 \mathrm{~V}\). If the...
-
A circuit contains a \(1.00 \times 10^{3}-\Omega\) resistor and a \(5.00 \times 10^{2}-\mathrm{mH}\) inductor in series. What is the impedance of the circuit when a \(1.00-\mathrm{kHz} \mathrm{AC}\)...
-
In an \(R C\) series circuit consisting of a \(20.0-\Omega\) resistor, a \(300-\mu \mathrm{F}\) capacitor, and an AC source, the source frequency is \(150 \mathrm{~s}^{-1}\). If the current has its...
-
An \(R L C\) series circuit has \(X_{C}=X_{L}=R\). When the circuit is operating at resonance, the current amplitude is 1. 0 A. When the angular frequency of the current is doubled, what is the new...
-
What are the main assumptions of the standard economic model?
-
Beginning around 1880 until the eve of the First World War (1914), Social Darwinism was directly primarily towards, immigrants from the following nations 1) Southern and Eastern Europeans who were...
-
What is the ideal number of children to have? This question was asked on the Sullivan Statistics Survey I. Draw a dot plot of the variable Children from theSullivanStatsSurveyI data set at...
-
Distinguish between noise and outliers. Be sure to consider the following questions. (a) Is noise ever interesting or desirable? Outliers? (b) Can noise objects be outliers? (c) Are noise objects...
-
Traditional agglomerative hierarchical clustering routines merge two clusters at each step. Does it seem likely that such an approach accurately captures the (nested) cluster structure of a set of...
-
Consider the one-dimensional data set shown in Table 5.4. (a) Classify the data point x = 5.0 according to its 1-, 3-, 5-, and 9-nearest neighbors (using majority vote). (b) Repeat the previous...
-
Estimate the fugacity of iso-butane at 15 atm and 87C using the compressibility factor correlation. BP Z=1+- > given that the second virial coefficient, B=-4.28 10 4 m/mol. RT
-
What is thermodynamic diagram? How can it be categorized? What is its importance? How is the thermodynamic diagram constructed?
-
Show that (a) \(\left(\frac{\partial H}{\partial P} ight)_{T}=\left[\frac{\partial(V / T)}{\partial(1 / T)} ight]_{P}\) (b) \(\frac{\partial}{\partial T}\left(\frac{\Delta G}{T} ight)=-\frac{\Delta...
Study smarter with the SolutionInn App