Question: Here is a dataset: Position Level Performance Previous Promotion Promotion Entry High No Yes Mid - level High Yes Yes Entry High No Yes Entry

Here is a dataset:
Position Level Performance Previous Promotion Promotion
Entry High No Yes
Mid-level High Yes Yes
Entry High No Yes
Entry Low No Yes
Entry Low No No
Mid-level Low Yes No
Mid-level High No Yes
Mid-level Low Yes No
Entry Low Yes Yes
I calc the overall Gini index for this:
Promotion
Proportion (No)=3/9=0.33
Proportion (Yes)=6/9=0.67
Gini Index =0.33(10.33)+0.67(10.67)
Gini Index =0.33(0.67)+0.67(0.33)
Gini Index =0.2211+0.2211
Gini Index =0.4422
To make the next split, I calc all Gini indices for 2 subregions R1/R2:
Weighted average for Position Level =0.3404
Weighted average for Performance =0.3244
Weighted average for Previous Promotion =0.3404
So, then I know the split should be on Performance. The next split, I calc all Gini indices:
Weighted Previous Promotion =0.3564
Weighted Position =0.3670
So, the next split is Previous Promotion. Which leaves one more split, and Gini calc is:
Weighted Position Level =0.3298.
The Gini index overall 0.4422>0.3298 after 4 regions/3 splits. Meaning I made an improvement. Is that correct?
Also, how can I draw the decision tree in a tree-based format? That is, draw the decision tree and each split, and indicate which prediction we would make for each region? Please help me. Thanks.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!