Question: You have a decision tree algorithm and you are trying to figure out which attribute is the best to test on first. You are using

You have a decision tree algorithm and you are trying to figure out which attribute is the best to test on first. You are using the information gain metric.

You are given a set of 128 examples, with 64 positively labeled and 64 negatively labeled.

There are three attributes: Homeowner (H), In Debt (ID), and Rich (R).

For 64 examples, Home Owner is true. The Homeowner=true examples are 1/4 negative and 3/4 positive.

For 96 examples, In Debt is true. Of the In Debt=true examples, 1/2 are positive and half are negative.

For 32 examples, Rich is true. 3/4 of the Rich=true examples are positive and 1/4 are negative

You must show all mathematical calculations/steps to get full points for each subpart (a) (d) below. Just writing the final answer in each subpart (correct or not) will get zero points.

a)What is the entropy of the initial set of examples?

b) What is the information gain of splitting on the Home Owner attribute as the root node?

c)What is the information gain of splitting on the In Debt attribute as the root node?

d) What is the information gain of splitting on the Rich attribute as the root node?

e) Which attribute do you split on?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!