Question: There are four descriptive features and one target feature in this dataset, as follows: AGE , a continuous feature listing the age of the individual;

There are four descriptive features and one target feature in this dataset, as follows:
AGE, a continuous feature listing the age of the individual;
EDUCATION, a categorical feature listing the highest education award achieved by the individual (high school, bachelors, doctorate);
MARITAL STATUS (never married, married, divorced);
OCCUPATION (transport = works in the transportation industry; professional = doctor, lawyer, or similar; agriculture = works in the agricultural industry; armed forces = is a member of the armed forces); and
ANNUAL INCOME, the target feature with 3 levels (a25K,25K50K, a50K).(a) Calculatetheentropyforthisdataset.
(b)CalculatetheGiniindexforthisdataset.
(c)In building a decision tree, the easiest way to handle a continuous feature is to define a threshold around which splits will be made. What would be the opti- mal threshold to split the continuous AGE feature (use information gain based on entropy as the feature selection measure)?
(d)Calculate information gain (based on entropy) for the EDUCATION, MARITAL STATUS, and OCCUPATION features.
(e)Calculatetheinformationgainratio(basedonentropy)forEDUCATION,MAR- ITAL STATUS, and OCCUPATION features.
(f)CalculateinformationgainusingtheGiniindexfortheEDUCATION,MARITAL STATUS, and OCCUPATION features.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!