Question: Please solve Problem 3 : Decision Trees Problem 2 . [ 3 0 points ] In this problem, you will investigate building a decision tree
Please solve Problem :
Decision Trees
Problem points In this problem, you will investigate building a decision tree for a binary clas
sification problem. The training data is given in Table with instances that will be used to learn a
decision tree for predicting whether a mushroom is edible or not based on its attributes Color Size, and
Shape Please note the label set is a binary set Yes No
Which attribute would the algorithm choose to use for the root of the tree. Show the details of your
calculations. Recall from lectures that if we let S denote the data set at current node, A denote
the feature with values v in V H denote the entropy function, and Sv denote the subset of S for
which the feature A has the value v the gain of a split along the feature A denoted InfoGainS A
is computed as:
InfoGainS A HS X
v in V
Sv
S
HSv
That is we are taking the difference of the entropy before the split, and subtracting off the entropies
of each new node after splitting, with an appropriate weight depending on the size of each node.
Draw the full decision tree that would be learned for this data assume no pruning and you stop split
ting a leaf node when all samples in the node belong to the same class, ie there is no information
gain in splitting the node
Instance Color Size Shape Edible
D Yellow Small Round Yes
D Yellow Small Round No
D Green Small Irregular Yes
D Green Large Irregular No
D Yellow Large Round Yes
D Yellow Small Round Yes
D Yellow Small Round Yes
D Yellow Small Round Yes
D Green Small Round No
D Yellow Large Round No
D Yellow Large Round Yes
D Yellow Large Round No
D Yellow Large Round No
D Yellow Large Round No
D Yellow Small Irregular Yes
D Yellow Large Irregular Yes
Table : Mushroom data with instances, three categorical features, and binary labels.
Problem points Handling real valued numerical features is totally different from categorical
features in splitting nodes. This problem intends to discuss a simple way to decide good thresholds for
splitting based on numerical features. Specifically, when there is a numerical feature in data, an option
would be treating all numeric values of feature as discrete, ie proceeding exactly as we do with categorical
data. What problems may arise when we use a tree derived this way to classify an unseen example?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
