Question: 1 Splitting Heuristic for Decision Trees (20 pts) Recall that the ID3 algorithm iteratively grows a decision tree from the root downwards. On each iteration,

1 Splitting Heuristic for Decision Trees (20 pts) Recall that the ID3 algorithm iteratively grows a decision tree from the root downwards. On each iteration, the algorithm replaces one leaf node with an internal node that splits the data based on one decision attribute (or feature). In particular, the ID3 algorithm chooses the split that reduces the entropy the most, but there are other choices. For example, since our goal in the end is to have the lowest error, why not instead choose the split that reduces error the most? In this problem, we will explore one reason why reducing entropy is a better criterion. Consider the following simple setting. Let us suppose each example is described by v boolean features: X = (X.....X.), where X, {0,1), and where n > 4. Furthermore, the target function to be learned is f : XY, where Y = Xi V X2 V X3. That is, Y = 1 if Xi = 1 or X2 = 1 or X3 = 1, and Y = 0 otherwise. Suppose that your training data contains all of the 2" possible examples, each labeled by f. For example, when n = 4, the data set would be X, X, X3 X | Y X1 X2 X3 X Y 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 (a) How many mistakes does the best 1-leaf decision tree make over the 2" training examples? (The 1-leaf decision tree does not split the data even once. Make sure you answer for the general case when n > 4.) (b) Is there a split that reduces the number of mistakes by at least one? (That is, is there a decision tree with 1 internal node with fewer mistakes than your answer to part (a)?) Why or why not? (Note that, as in lecture, you should restrict your attention to splits that consider a single attribute.) (c) What is the entropy of the output label Y for the 1-leaf decision tree (no splits at all)? (d) Is there a split that reduces the entropy of the output Y by a non-zero amount? If so, what is it, and what is the resulting conditional entropy of Y given this split? (Again, as in lecture, you should restrict your attention to splits that consider a single attribute.) 1 Splitting Heuristic for Decision Trees (20 pts) Recall that the ID3 algorithm iteratively grows a decision tree from the root downwards. On each iteration, the algorithm replaces one leaf node with an internal node that splits the data based on one decision attribute (or feature). In particular, the ID3 algorithm chooses the split that reduces the entropy the most, but there are other choices. For example, since our goal in the end is to have the lowest error, why not instead choose the split that reduces error the most? In this problem, we will explore one reason why reducing entropy is a better criterion. Consider the following simple setting. Let us suppose each example is described by v boolean features: X = (X.....X.), where X, {0,1), and where n > 4. Furthermore, the target function to be learned is f : XY, where Y = Xi V X2 V X3. That is, Y = 1 if Xi = 1 or X2 = 1 or X3 = 1, and Y = 0 otherwise. Suppose that your training data contains all of the 2" possible examples, each labeled by f. For example, when n = 4, the data set would be X, X, X3 X | Y X1 X2 X3 X Y 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 (a) How many mistakes does the best 1-leaf decision tree make over the 2" training examples? (The 1-leaf decision tree does not split the data even once. Make sure you answer for the general case when n > 4.) (b) Is there a split that reduces the number of mistakes by at least one? (That is, is there a decision tree with 1 internal node with fewer mistakes than your answer to part (a)?) Why or why not? (Note that, as in lecture, you should restrict your attention to splits that consider a single attribute.) (c) What is the entropy of the output label Y for the 1-leaf decision tree (no splits at all)? (d) Is there a split that reduces the entropy of the output Y by a non-zero amount? If so, what is it, and what is the resulting conditional entropy of Y given this split? (Again, as in lecture, you should restrict your attention to splits that consider a single attribute.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

On December 31, 2023, Berclair Incorporated had 240 million shares of common stock and 6 million shares of 9%, $100 par value cumulative preferred stock issued and outstanding. On March 1, 2024,...

Classification is a form of data analysis where a model or classifier is constructed to predict class labels or types. Data classification is a two-phase process, 1) a learning phase in which the...

Give the typing rules for Peano natural numbers and their eliminator.(ii) Using the rules given above, define the addition function.] (iii) Let a binary tree be either a leaf Leaf or a node...

Problem 2 . Greedy Heuristic for Decision Trees Consider the following data set that contains 1 0 0 training examples ( 5 0 labeled as positive class while the remainder labeled as negative class ) ....

Problem 2 . Greedy Heuristic for Decision Trees 1 5 Points Consider the following data set that contains 1 0 0 training examples ( 5 0 labeled as positive class while the remainder labeled as...

Help with writing a short analytical summary of 150-200 words on each of the 2 articles below. Article 1: Exploring community-based options for reducing youth crime. The BackTrack program was...

Evaluate Classifier ticular about how her clothes fade after washing. She notices that her predictions on whether a piece of clothing will fade or not are often incorrect. To improve her predic -...

Calculate the Information Gain for each feature in the dataset . 2 Decision Tree Consider the training data shown in Table 1 : Construct a decision tree by splitting based on the gain in Information...

Consider the following decision scenario: In making a hiring decision, supervisors often face difficult issues. Your company is advertising for a new employee to work in your department. The person...

The Article discusses such things, such as; The primary purpose of this article is to inform people that they need time before making any decisions, even if they have faced it previously. Increasing...

10 11.67 Points An executive makes salary increase recommendations for key personnel by simply adjusting their current base salaries by a percentage amount. This is an example of which heuristic?...

The chapter states that forecasts of financial statements should rely on the additivity within financial statements and the articulation across financial statements to avoid internal inconsistencies...

Does it make sense for a corporation to repurchase its own stock? Explain.

When performing a 2 test for independence in a contingency table with r rows and c columns, determine the upper-tail critical value of the 2 test statistic in each of the following circumstances. a. ...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

How are Work Breakdown Statements Built and how do they appear in a Project Plan?

What is the most important part of any HCM Project Map and why?

What is the Phase that begins after Project rollover and what activities are part of the Phase?