Question: Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a

 Information Gain Exercise The information gain is based on the decrease

Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches). To determine which attribute should be used to split the data, we need to calculate the information gain. Answer the following questions below: Entire population/ all customers (140 cases) Make a Purchase Total No Yes 90 50 | 140 1. What is the entropy of the parent node (i.e. entire population)? Attribute 1: Customers' marital status Total No Make a Purchase Yes 60 30 20 Married Single 30 80 60 140 Attribute 2: Customers' income level Total Make a Purchase Yes 50 High Medium Low 40 40 | 20 50 140 2. What are the entropies of the child nodes? 3. What are the information gains? 4. In conclusion, which variable (marital status or income level) is more informative (and should be used to split the data)? Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches). To determine which attribute should be used to split the data, we need to calculate the information gain. Answer the following questions below: Entire population/ all customers (140 cases) Make a Purchase Total No Yes 90 50 | 140 1. What is the entropy of the parent node (i.e. entire population)? Attribute 1: Customers' marital status Total No Make a Purchase Yes 60 30 20 Married Single 30 80 60 140 Attribute 2: Customers' income level Total Make a Purchase Yes 50 High Medium Low 40 40 | 20 50 140 2. What are the entropies of the child nodes? 3. What are the information gains? 4. In conclusion, which variable (marital status or income level) is more informative (and should be used to split the data)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!