Question: Your final project will be two submissions: A python file in . py or . ipynb and a paper in . pdf format explaining your

Your final project will be two submissions: A python file in

.

py or

.

ipynb and a paper in

.

pdf format explaining your data available with visuals, your process to clean the data and process and then the results. The attached rubric will guide you on grading criteria. Please use VS Code.

Python File:

Should submit working code

Well commented or documented on what you did or were trying to do

Bring in associated data using pandas

Examine data as needed to understand whats available

If applicable, make changes, joins or manipulate data

Complete task outlined in Project use case

1 - 3

Paper sections should include:

Introduction

Data Available

Process

Results

Improvements

(

what would you do differently with additional data

)

Include citations if needed

Minimum length

2 - 4

pages

Data Cleansing Final Project Instructions:

There is a table in a database called addresses. Each row in the table holds anaddress. And has the columns:

id: the unique ID for the row

street: which should hold the street for the address

city: the city

state: the state

zip: a string representing the zip code

country: the country

latitude: a floating point number representing the latitude for the address

longitude: a floating point number representing the longitude for the address

default map: a boolean

The columns latitude, longitude and default map can be ignored. Not all the columns in the table are correct. For example instead of saying

East Providence

(

a city in RI

),

the city column for certain addresses may read

East Providence,

.

In addition, many of the addresses are repeats.

This table is provided to you as a csv file.

Your task is to write python code to:

1 .

Write python code to create a csv called corrected

_

addresses.csv where each address in the addresses table is corrected i

.

.

all issues in columns street, city, state, zip, country are corrected. The columns for this csv should be:

id: the unique ID for the row

street: which should hold the street for the address

city: the city

state: the state

zip: a string representing the zip code

country: the country

2 .

write python code to create a csv called corrected

_

unique

_

addresses.csv which will

only contain unique addresses from corrected

_

addresses.csv

3 .

write python code to create a csv called linker

_

_

corrected

_

unique

_

addresses.csv

which contains

2

columns:

original

_

id: the id for each row in the addresses.csv file

_

id : the id for the corresponding corrected unique address in the corrected

_

unique

_

addresses.csv table

4 .

Load the

3

csvs into tables in a postgres database or spreadsheet called normalized

_

addresses. And then create a dump of the postgres database using the pg

_

dump command.

Please submit back:

the code developed

(

and documented

)

and

the data dump file resulting from the script

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Instructions: Submission Format: Submit your answers as two files to Canvas: A single consolidated . pdf file. A . py or . ipynb file containing your Python code. Report Contents: Include code,...

FYI - workbook is not needed, already done. Did attach the file for assignment purposes. I need a Financial statement analysis paper. Guidelines and rubric pdf is attached for more information. This...

I have this project that is due within 24 hours. I'm sick pneumonia and haven't been able to get a grasp of it at all. It's 2 parts the first part is due tomorrow by 11:59 and the part II is due...

You have been asked by your 58 year old father, Gamal to help him assess a new venture. It is Friday night, and he needs the work finished by Sunday, in preparation for an early Monday morning...

Hi I have to answer assumptions and estimations of a new business. the question is as attached. UCBS7037 Financial Management Assessment University of Cumbria and Robert Kennedy College General...

RMIT UNIVERSITY Programming Fundamentals (COSC2531) Assignment 2 Individual assignment (no group work). Submit online via Canvas/Assignments/Assignment 2. Marks are awarded per rubric (please see the...

ODJCCLIVE Your first programming assignment is to implement Kruskal's and Prim's minimum spanning tree algorithms. The main objective of this assignment is to empirically validate the most efficient...

Table of Contents Main Objective of the assessment 1 Description of the Assessment 1 Learning Outcomes and Marking Criteria. 4 Format of the Assessment 6 Submission Instructions. 7 Avoiding...

this assignment is regarding return the tax of a client by using handy taxassignment. can anyone help me to complete the income section of this assignment, just write the solution in a pdf file?I...

Processing steps for 18 questions are required. Thanks so much for help! Queensland University of Technology QUT Business School School of Accountancy AYB 219 Taxation Law HandiTax Group Project...

A company reports the following: Net income $294,110 Preferred dividends $21,790 Shares of common stock outstanding 46,000 Market price per share of common stock $110.59 Calculate the company's...

A circus performer begins his act by walking out along a nearly horizontal high wire. He slips and falls to the safety net, 25.0 ft below. The magnitude of his displacement from the beginning of the...

(8) Derive the conclusion of the following symbolized arguments. Use conditional proof or indirect proof as needed. 1. (x)(x = r) 2. Hr Kn / Hn Kr

Budgetary control is one of the four basic types of strategic control. Question 26 options: 1) True 2) False