Question: Your final project will be two submissions: A python file in . py or . ipynb and a paper in . pdf format explaining your
Your final project will be two submissions: A python file in py or ipynb and a paper in pdf format explaining your data available with visuals, your process to clean the data and process and then the results. The attached rubric will guide you on grading criteria. Please use VS Code.
Python File:
Should submit working code
Well commented or documented on what you did or were trying to do
Bring in associated data using pandas
Examine data as needed to understand whats available
If applicable, make changes, joins or manipulate data
Complete task outlined in Project use case
Paper sections should include:
Introduction
Data Available
Process
Results
Improvements what would you do differently with additional data
Include citations if needed
Minimum length pages
Data Cleansing Final Project Instructions:
There is a table in a database called addresses. Each row in the table holds anaddress. And has the columns:
id: the unique ID for the row
street: which should hold the street for the address
city: the city
state: the state
zip: a string representing the zip code
country: the country
latitude: a floating point number representing the latitude for the address
longitude: a floating point number representing the longitude for the address
default map: a boolean
The columns latitude, longitude and default map can be ignored. Not all the columns in the table are correct. For example instead of saying East Providencea city in RI the city column for certain addresses may read East Providence,
In addition, many of the addresses are repeats.
This table is provided to you as a csv file.
Your task is to write python code to:
Write python code to create a csv called correctedaddresses.csv where each address in the addresses table is corrected ie all issues in columns street, city, state, zip, country are corrected. The columns for this csv should be:
id: the unique ID for the row
street: which should hold the street for the address
city: the city
state: the state
zip: a string representing the zip code
country: the country
write python code to create a csv called correcteduniqueaddresses.csv which will
only contain unique addresses from correctedaddresses.csv
write python code to create a csv called linkertocorrecteduniqueaddresses.csv
which contains columns:
originalid: the id for each row in the addresses.csv file
cuid : the id for the corresponding corrected unique address in the correcteduniqueaddresses.csv table
Load the csvs into tables in a postgres database or spreadsheet called normalizedaddresses. And then create a dump of the postgres database using the pgdump command.
Please submit back:
the code developed and documented and
the data dump file resulting from the script
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
