Question: Suppose that you are developing a decision tree where the root node (parent node) has 50 data points with 20 corresponding to Yes and 30

Suppose that you are developing a decision tree where the root node (parent node) has 50 data points with 20 corresponding to "Yes" and 30 corresponding to "No" class of the target variable. You split the tree based on an input variable (which has two categories) and create two branches, A and B (see the figure below). In this split. Branch B has 20 data points out of which 18 belong to "Yes" and 12 belong to "No" class of the target variable. Calculate the impurity measures of the nodes A and B. a) Calculate Node TA Total count: b) Calculate Node A Yes count: c) Calculate Node A No count: d) Gini index of the parent node: e) Gini index of the node A: f) Gini index of the node B: g) Information gain in Gini index: h) Entropy of the parent node: (Hint: You can find a log value of a number by using "=LOG(Number, Base)" in Excel. For example if you want to find the value of log2(0.5), then enter " =LOG(0.5,2)" in Excel, which will give you a - 1.) i) Entropy of the node A: j) Entropy of the node B: k) Information gain in Entropy: l) Misclassification error the node A: m) Misclassification error the node B: n) Information gain in Misclassification error
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
