Question: The following question pertains to Python. Write a Python script that reads a file known as adult.data line by line, extracts the education attribute from
The following question pertains to Python.
Write a Python script that reads a file known as adult.data line by line, extracts the education attribute from each line and prints it into an output file named adult.ed.data
The output file should contain this data:
Bachelors
Bachelors
HS-grad
11th
Bachelors
Masters
.
.
The script should also compute the frequency of each education value in the dataset file. That is, how many people have a Bachelors degree, how many people are HS-grad, etc.?
Hint: Use a dictionary to store the distinct values and their counts as key-value pairs. Modify the dictionarys keys and values as you read the dataset line by line. Print the key-value pairs of the final dictionary.
The algorithm for the code should look like this:
- Open data input file in reading mode, call it in_fp
- Open data output file in writing mode, call it out_fp
- Read the first line in the input file and ignore it. This can be done by calling in_fp.readline()
- Create an empty dictionary: ed_count = {}
- For each line in the input file:
- Split the line based on the comma separator using the split function and store the results in an array A:
A = line.split(",")
- Extract the education column from the array A. Education is in the fourth column of the data file. So, the education entry in A is found at index 3. That is, A[3] is the education level for the current person whose information is stored in the line variable.
To extract the education, create a variable ed = A[3]
- Check if ed exists in the dictionary ed_ count.
If ed is in ed_count, then set ed_count[ed]
Else set ed_count[ed] = 1
- After the for loop, print the keys and values in the ed_count dictionary.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
