Question: import pandas as pd import numpy as np from scipy.stats import pearsonr import matplotlib.pyplot as plt # Read data files house _ prices _ df
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt
# Read data files
housepricesdf pdreadexcelMeanHousePricesCleanxlsx
crimedf pdreadexcelCrimeCleanxlsx
populationdf pdreadexcelPopulationCleanxlsx
areadf pdreadexcelSuburbAreasxlsx
# Step B: Clean and prepare data
def preparedatadf columns:
df dfdropnasubsetcolumns # Remove rows with missing values in key columns
return df
# Rename columns for consistency
housepricesdf housepricesdfrenamecolumnsYear: 'year'
crimedf crimedfrenamecolumns
'Year': 'year',
'Crime rate per population': 'crimerate',
'Local Government Area': 'localgovernmentarea'
populationdf populationdfrenamecolumnsYear: 'year'
areadf areadfrenamecolumnsProperty: 'localgovernmentarea'
# Clean the area DataFrame to remove nonrelevant rows
areadf areadfareadflocalgovernmentarea' 'Area sq Km
# Step C: Analysis functions
def analyzecorrelationdf col col:
df dfdropnasubsetcol col
if lendf:
return npnan
correlation, pearsonrdfcol dfcol
return correlation
# Reshape housepricesdf to long format
housepriceslong housepricesdfmeltidvarsyear varname'localgovernmentarea', valuename'meanhouseprice'
# Reshape populationdf to long format
populationlong populationdfmeltidvarsyear varname'localgovernmentarea', valuename'population'
# Reshape areadf to long format
arealong areadfmeltidvarslocalgovernmentarea' varname'year', valuename'area'
# Merge the datasets on 'year' and 'localgovernmentarea'
mergeddf pdmergecrimedf housepriceslong, onyear 'localgovernmentarea' how'inner'
mergeddf pdmergemergeddf populationlong, onyear 'localgovernmentarea' how'inner'
mergeddf pdmergemergeddf arealong, on'localgovernmentarea', how'inner'
# Calculate population density
mergeddfpopulationdensity' mergeddfpopulation mergeddfarea
# Step D: Prepare the data by cleaning
mergeddf preparedatamergeddfmeanhouseprice', 'crimerate', 'populationdensity'
# Step E: Perform correlation analysis
housepricepopulationcorr analyzecorrelationmergeddf 'meanhouseprice', 'populationdensity'
crimehousepricecorr analyzecorrelationmergeddf 'crimerate', 'meanhouseprice'
crimepopulationdensitycorr analyzecorrelationmergeddf 'crimerate', 'populationdensity'
# Step F: Print the results
printfCorrelation between house prices and population density: housepricepopulationcorr
printfCorrelation between crime rate and house prices: crimehousepricecorr
printfCorrelation between crime rate and population density: crimepopulationdensitycorr
# Plotting for visual analysis
pltfigurefigsize
pltscattermergeddfpopulationdensity' mergeddfmeanhouseprice'
plttitleHouse Price vs Population Density'
pltxlabelPopulation Density people per square km
pltylabelMean House Price'
pltgridTrue
pltshow
pltfigurefigsize
pltscattermergeddfmeanhouseprice' mergeddfcrimerate'
plttitleCrime Rate vs House Price'
pltxlabelMean House Price'
pltylabelCrime Rate per population
pltgridTrue
pltshow
pltfigurefigsize
pltscattermergeddfpopulationdensity' mergeddfcrimerate'
plttitleCrime Rate vs Population Density'
pltxlabelPopulation Density people per square km
pltylabelCrime Rate per population
pltgridTrue
pltshow
The above python code is run in google colab and it produces nil output as nan modify and correct the code and add some more correlation if it helps to achieve my output. i have access to all the excel sheets of data. something is wrong since it keeps on producing Nan. i am getting no correlation no matter how i edit so i will explain what each data sheet looks like. i suspect it is not aligning with the merging in code. MeanHousePricesCleanxlsx:
The first row contains years and area names.
From the second row onwards, each row represents a year, and the values in each column represent mean house prices for the corresponding area.
CrimeCleanxlsx:
The first row contains years, local government area names, incidents recorded, and crime rate per population.
From the second row onwards, each row represents data for a specific year and area, including incidents recorded and crime rate.
PopulationClean.xlsx:
Similar to MeanHousePricesCleanxlsx the first row contains years and area names.
From the second row onwards, each row represents a year
SuburbAreasxlsx:
The first row contains property names.
The second row contains the corresponding area in square kilometers
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
