Question: 2 Application of Decision Tree on Real - Word Data - set [ 2 5 pts ] In this task, you will build a decision
Application of Decision Tree on RealWord Dataset pts
In this task, you will build a decision tree classifier using a realworld data set called CensusIncome Data Set, available publicly for downloading at Dataset. This data set contains weighted census data extracted from the and Current Population Surveys conducted by the US Census Bureau. The data contains demographic and employmentrelated variables.
Basic statistics for this data set are provided below.
Number of instances data
Duplicate or conflicting instances:
Number of instances in test
Duplicate or conflicting instances:
Class probabilities for incomeprojected.test file
Probability for the label :
Probability for the label :
Majority accuracy: on value
Number of attributes continuous: nominal:
Information about data file :
distinct values for attribute #age continuous
distinct values for attribute #class of worker nominal
distinct values for attribute #detailed industry recode nominal
distinct values for attribute #detailed occupation recode nominal
distinct values for attribute #education nominal
distinct values for attribute #wage per hour continuous
distinct values for attribute #enroll in edu inst last wk nominal
distinct values for attribute #marital stat nominal
distinct values for attribute #major industry code nominal
distinct values for attribute #major occupation code nominal
distinct values for attribute #race nominal
distinct values for attribute #Hispanic origin nominal
distinct values for attribute #sex nominal
distinct values for attribute #member of a labor union nominal
distinct values for attribute #reason for unemployment nominal
distinct values for attribute #full or parttime employment stat nominal
distinct values for attribute #capital gains continuous
distinct values for attribute #capital losses continuous
distinct values for attribute #dividends from stocks continuous
distinct values for attribute #tax filer stat nominal
distinct values for attribute #region of previous residence nominal
distinct values for attribute #state of previous residence nominal
distinct values for attribute #detailed household and family stat nominal
distinct values for attribute #detailed household summary in the household nominal
distinct values for attribute #migration codechange in MSA nominal
distinct values for attribute #migration codechange in reg nominal
distinct values for attribute #migration codemove within reg nominal
distinct values for attribute #live in this house one year ago nominal
distinct values for attribute #migration prev res in sunbelt nominal
distinct values for attribute #num persons worked for the employer continuous
distinct values for attribute #family members under nominal
distinct values for attribute #country of birth father nominal
distinct values for attribute #country of birth mother nominal
distinct values for attribute #country of birth self nominal
distinct values for attribute #citizenship nominal
distinct values for attribute #own business or selfemployed nominal
distinct values for attribute #fill inc questionnaire for veteran's admin nominal
distinct values for attribute #veterans benefits nominal
distinct values for attribute #weeks worked in year continuous
distinct values for attribute #year nominal
Classes:
One instance per line with commadelimited fields. There are instances in the data file and in the test file.
The data was split into traintest in approximately proportions using MineSet's MIndUtil minesettomlc Below are your tasks:
a pts Train a decision tree classifier using the data file. You CAN NOT use any decision tree library functions to do it ie you must construct the tree from scratch. You also CAN NOT touch the test file in this part. Vary the cutoff depth from to and report the training accuracy for each cutoff depth Based on your results, select an optimal
b pts Using the trained classifier with optimal cutoff depth classify the instances from the test file and report the testing accuracy the portion of testing instances classified correctly
c pts Do you see any overfitting issues for this experiment? Report
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
