Question: Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a

Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches). To determine which attribute should be used to split the data, we need to calculate the information gain. Answer the following questions below: Entire population/ all customers (140 cases) Make a Purchase Total No Yes 90 50 | 140 1. What is the entropy of the parent node (i.e. entire population)? Attribute 1: Customers' marital status Total No Make a Purchase Yes 60 30 20 Married Single 30 80 60 140 Attribute 2: Customers' income level Total Make a Purchase Yes 50 High Medium Low 40 40 | 20 50 140 2. What are the entropies of the child nodes? 3. What are the information gains? 4. In conclusion, which variable (marital status or income level) is more informative (and should be used to split the data)? Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches). To determine which attribute should be used to split the data, we need to calculate the information gain. Answer the following questions below: Entire population/ all customers (140 cases) Make a Purchase Total No Yes 90 50 | 140 1. What is the entropy of the parent node (i.e. entire population)? Attribute 1: Customers' marital status Total No Make a Purchase Yes 60 30 20 Married Single 30 80 60 140 Attribute 2: Customers' income level Total Make a Purchase Yes 50 High Medium Low 40 40 | 20 50 140 2. What are the entropies of the child nodes? 3. What are the information gains? 4. In conclusion, which variable (marital status or income level) is more informative (and should be used to split the data)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
