Question: Objective The project involves working in pairs to acquire, clean, and integrate datasets related to housing trends in a U . S . region or
Objective
The project involves working in pairs to acquire, clean, and integrate datasets related to housing trends in a US region or state. The goal is to create a unified, clean, and analysisready dataset while demonstrating data wrangling proficiency.
Project Overview
You are part of a data preparation team tasked with creating a clean dataset for analyzing housing trends. Select a US region or state eg California, Texas, New York City and focus on acquiring and integrating datasets from various sources related to your chosen area.
Project Steps
Data Acquisition
Choose a Region or State:
Select a region or state to focus on Indicate your selection on the course discussion thread, including your group number.
Examples: California, Texas, New York City, Seattle, Austin.
Acquire Data from Multiple Sources:
Excel Files: Import neighborhood demographic data.
Source: Census DataCSV Files: Import housing prices and rental costs.
Sources:
Zillow Housing DataKaggle Datasets
Web Scraping: Collect rental listings and descriptions from real estate sites eg Craigslist, Apartments.comAPI Access: Retrieve recent housing market trends.
API Documentation: Zillow APIPDF Files: Extract data from government housing policy reports.
Example Source: HUD Reports
Data Cleaning and Integration
Data Cleaning:
Standardize column names and formats.Handle missing data appropriately.Convert numerical data to consistent formats.Resolve inconsistencies in datetime formats.Remove duplicates.
Data Integration:
Merge datasets using common keys like zip code or neighborhood.Add calculated fields eg pricetoincome ratioValidate the merged dataset to ensure accuracy and completeness.
Deliverables
Clean Dataset:
A single CSV file with merged, cleaned data for your region.
Python Code:
Documented Python scripts or Jupyter Notebooks for:
Acquiring data Excel CSV PDF scraping, APICleaning data handling missing values, standardizingIntegrating data merging and validating
Summary:
A brief description words of the region, datasets used, and the cleaningintegration process.
Skills Applied
Python Basics: File handling and scripting.
File Processing: Reading and writing Excel, CSV and PDF files.
Web Scraping: Using Python libraries to collect online data.
API Access: Retrieving structured data from APIs.
Data Cleaning: Standardizing and handling inconsistencies.
Data Integration: Creating a cohesive dataset.
Starter Resources
Census Demographics Data
Zillow Housing Data
Kaggle Datasets
HUD Reports
Zillow API Documentation, "Can you provide me the full code that has been tested and also help me where to go to which dataset to download so that it fits with the code that you provided."
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
