Question: Your final project will be two submissions: A python file in . py or . ipynb and a paper in . pdf format explaining your

Your final project will be two submissions: A python file in .py or .ipynb and a paper in .pdf format explaining your data available with visuals, your process to clean the data and process and then the results. The attached rubric will guide you on grading criteria. Please use VS Code.
Python File:
Should submit working code
Well commented or documented on what you did or were trying to do
Bring in associated data using pandas
Examine data as needed to understand whats available
If applicable, make changes, joins or manipulate data
Complete task outlined in Project use case 1-3
Paper sections should include:
Introduction
Data Available
Process
Results
Improvements (what would you do differently with additional data)
Include citations if needed
Minimum length 2-4 pages
Data Cleansing Final Project Instructions:
There is a table in a database called addresses. Each row in the table holds anaddress. And has the columns:
id: the unique ID for the row
street: which should hold the street for the address
city: the city
state: the state
zip: a string representing the zip code
country: the country
latitude: a floating point number representing the latitude for the address
longitude: a floating point number representing the longitude for the address
default map: a boolean
The columns latitude, longitude and default map can be ignored. Not all the columns in the table are correct. For example instead of saying East Providence(a city in RI), the city column for certain addresses may read East Providence,.
In addition, many of the addresses are repeats.
This table is provided to you as a csv file.
Your task is to write python code to:
1. Write python code to create a csv called corrected_addresses.csv where each address in the addresses table is corrected i.e. all issues in columns street, city, state, zip, country are corrected. The columns for this csv should be:
id: the unique ID for the row
street: which should hold the street for the address
city: the city
state: the state
zip: a string representing the zip code
country: the country
2. write python code to create a csv called corrected_unique_addresses.csv which will
only contain unique addresses from corrected_addresses.csv
3. write python code to create a csv called linker_to_corrected_unique_addresses.csv
which contains 2 columns:
original_id: the id for each row in the addresses.csv file
cu_id : the id for the corresponding corrected unique address in the corrected_unique_addresses.csv table
4. Load the 3 csvs into tables in a postgres database or spreadsheet called normalized_addresses. And then create a dump of the postgres database using the pg_dump command.
Please submit back:
the code developed (and documented) and
the data dump file resulting from the script

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!