Question: Consider the dataset shown in Table 1 for a binary classification problem, Customer ID H ousing Type Gender Marital Status Class 11 Married Single House



Consider the dataset shown in Table 1 for a binary classification problem, Customer ID H ousing Type Gender Marital Status Class 11 Married Single House Female Married Female Single Male Married Hostel Male Single - House Female Married Apartment Female Single 18 8 8 5 5 8 8 5 5 8 8 5 5 8 Apartment Male Married House Male Single Hostel Female Married Hostel Female Single House Male Married Hostel Male Single Hostel Female Married Apartment Female Single Table 1 . d. 12.5 points) Compute the Gain Ratio for splitting over each of the four attributes. Which attribute provides the highest Gain Ratio? e. 12 points) For splitting at the root node, would you choose the attribute that provides the maximum IG, or the attribute that provides maximum Gain Ratio? Briefly explain your choice f. [3 points] Consider the following 3 decision trees: ( Marital Status Married Single Customer ID Gender) (Gender Tree 1 Tree 2 Housing Type Apartment House Hostel Gender Gender Gender Compute the difference between the entropy of overall data with the weighted entropy of the leaves for each of the three trees. Based on these differences, which tree would you choose for performing classification is the attribute chosen at the root of this tree same as the attribute chosen for splitting in 7 Briefly comment on the nature of your results, and the properties of the impurity measure used while constructing decision trees. Consider the dataset shown in Table 1 for a binary classification problem, Customer ID H ousing Type Gender Marital Status Class 11 Married Single House Female Married Female Single Male Married Hostel Male Single - House Female Married Apartment Female Single 18 8 8 5 5 8 8 5 5 8 8 5 5 8 Apartment Male Married House Male Single Hostel Female Married Hostel Female Single House Male Married Hostel Male Single Hostel Female Married Apartment Female Single Table 1 . d. 12.5 points) Compute the Gain Ratio for splitting over each of the four attributes. Which attribute provides the highest Gain Ratio? e. 12 points) For splitting at the root node, would you choose the attribute that provides the maximum IG, or the attribute that provides maximum Gain Ratio? Briefly explain your choice f. [3 points] Consider the following 3 decision trees: ( Marital Status Married Single Customer ID Gender) (Gender Tree 1 Tree 2 Housing Type Apartment House Hostel Gender Gender Gender Compute the difference between the entropy of overall data with the weighted entropy of the leaves for each of the three trees. Based on these differences, which tree would you choose for performing classification is the attribute chosen at the root of this tree same as the attribute chosen for splitting in 7 Briefly comment on the nature of your results, and the properties of the impurity measure used while constructing decision trees
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
