Question: In this task, you will load a real - world data set called Census - Income Data Set, available publicly for downloading at Dataset. This

In this task, you will load a real-world data set called Census-Income Data Set, available
publicly for downloading at Dataset. This data set contains weighted census data extracted
from the 1994 and 1995 Current Population Surveys conducted by the U.S. Census Bureau.
The data contains 41 demographic and employment-related variables. Basic statistics for
this data set are provided below.
Number of instances data =199523
Duplicate or conflicting instances: 46716
Number of instances in test =99762
Duplicate or conflicting instances: 20936
Class probabilities for income-projected.test file
Probability for the label -50000 : 93.80
Probability for the label 50000+ : 6.20
Majority accuracy: 93.80% on value -50000
Number of attributes =40(continuous : 7 nominal : 33)
Information about .data file :
91 distinct values for attribute #0(age) continuous
9 distinct values for attribute #1(class of worker) nominal
52 distinct values for attribute #2(detailed industry recode) nominal
47 distinct values for attribute #3(detailed occupation recode) nominal
4
17 distinct values for attribute #4(education) nominal
1240 distinct values for attribute #5(wage per hour) continuous
3 distinct values for attribute #6(enroll in edu inst last wk) nominal
7 distinct values for attribute #7(marital stat) nominal
24 distinct values for attribute #8(major industry code) nominal
15 distinct values for attribute #9(major occupation code) nominal
5 distinct values for attribute #10(race) nominal
10 distinct values for attribute #11(hispanic origin) nominal
2 distinct values for attribute #12(sex) nominal
3 distinct values for attribute #13(member of a labor union) nominal
6 distinct values for attribute #14(reason for unemployment) nominal
8 distinct values for attribute #15(full or part time employment stat) nominal
132 distinct values for attribute #16(capital gains) continuous
113 distinct values for attribute #17(capital losses) continuous
1478 distinct values for attribute #18(dividends from stocks) continuous
6 distinct values for attribute #19(tax filer stat) nominal
6 distinct values for attribute #20(region of previous residence) nominal
51 distinct values for attribute #21(state of previous residence) nominal
38 distinct values for attribute #22(detailed household and family stat) nominal
8 distinct values for attribute #23(detailed household summary in household)
nominal
10 distinct values for attribute #24(migration code-change in msa) nominal
9 distinct values for attribute #25(migration code-change in reg) nominal
10 distinct values for attribute #26(migration code-move within reg) nominal
3 distinct values for attribute #27(live in this house 1 year ago) nominal
4 distinct values for attribute #28(migration prev res in sunbelt) nominal
7 distinct values for attribute #29(num persons worked for employer) continuous
5 distinct values for attribute #30(family members under 18) nominal
43 distinct values for attribute #31(country of birth father) nominal
43 distinct values for attribute #32(country of birth mother) nominal
43 distinct values for attribute #33(country of birth self) nominal
5 distinct values for attribute #34(citizenship) nominal
3 distinct values for attribute #35(own business or self employed) nominal
3 distinct values for attribute #36(fill inc questionnaire for veterans admin) nominal
3 distinct values for attribute #37(veterans benefits) nominal
53 distinct values for attribute #38(weeks worked in year) continuous
2 distinct values for attribute #39(year) nominal
classes: -50000,50000+.
One instance per line with comma-delimited fields. There are 199,523 instances in the data
file and 99,762 in the test file.
The data was split into train/test in approximately 2
3
,
1
3
proportions using MineSets MIndUtil mineset-to-mlc. Below are your tasks:
5
(a) Based on the training data, how many people have an income of more than 50K per
year?
(b) Based on the testing data, how many people have an income of more than 50K per
year?
(c) Based on the testing data, how many people are Asian or Pacific Islander?
(d) Based on the training data, what is the average age of people with more than 50K
income per year?
(e) Based on the testing data, what is the average age of people with more than 50K income
per year?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!