import pandas as pd import numpy as np from scipy stats import pearsonr import matplotlib pyplot as plt Read data files house prices df pd read excel ( MeanHousePricesClean 1 xlsx ) crime df pd read excel ( CrimeClean 1 1 xlsx ) population df pd read excel ( PopulationClean xlsx ) area df pd read excel ( SuburbAreas 1 xlsx ) Step B Clean and prepare data def prepare data ( df , columns ) df df dropna ( subset columns ) Remove rows with missing values in key columns return df Rename columns for consistency house prices df house prices df rename ( columns ' Year ' 'year' ) crime df crime df rename ( columns 'Year' 'year', 'Crime rate per 1 0 0 , 0 0 0 population' 'crime rate', 'Local Government Area' 'local government area' ) population df population df rename ( columns ' Year ' 'year' ) area df area df rename ( columns ' Property ' 'local government area' ) Clean the area DataFrame to remove non relevant rows area df area df area df ' local government area' 'Area sq Km ' Step C Analysis functions def analyze correlation ( df , col 1 , col 2 ) df df dropna ( subset col 1 , col 2 ) if len ( df ) 2 return np nan correlation, pearsonr ( df col 1 , df col 2 ) return correlation Reshape house prices df to long format house prices long house prices df melt ( id vars ' year ' , var name 'local government area', value name 'mean house price' ) Reshape population df to long format population long population df melt ( id vars ' year ' , var name 'local government area', value name 'population' ) Reshape area df to long format area long area df melt ( id vars ' local government area' , var name 'year', value name 'area' ) Merge the datasets on 'year' and 'local government area' merged df pd merge ( crime df , house prices long, on ' year ' , 'local government area' , how 'inner' ) merged df pd merge ( merged df , population long, on ' year ' , 'local government area' , how 'inner' ) merged df pd merge ( merged df , area long, on 'local government area', how 'inner' ) Calculate population density merged df ' population density' merged df ' population ' merged df ' area ' Step D Prepare the data by cleaning merged df prepare data ( merged df , ' mean house price', 'crime rate', 'population density' ) Step E Perform correlation analysis house price population corr analyze correlation ( merged df , 'mean house price', 'population density' ) crime house price corr analyze correlation ( merged df , 'crime rate', 'mean house price' ) crime population density corr analyze correlation ( merged df , 'crime rate', 'population density' ) Step F Print the results print ( f Correlation between house prices and population density house price population corr ) print ( f Correlation between crime rate and house prices crime house price corr ) print ( f Correlation between crime rate and population density crime population density corr ) Plotting for visual analysis plt figure ( figsize ( 1 0 , 6 ) ) plt scatter ( merged df ' population density' , merged df ' mean house price' ) plt title ( ' House Price vs Population Density' ) plt xlabel ( ' Population Density ( people per square km ) ' ) plt ylabel ( ' Mean House Price' ) plt grid ( True ) plt show ( ) plt figure ( figsize ( 1 0 , 6 ) ) plt scatter ( merged df ' mean house price' , merged df ' crime rate' ) plt title ( ' Crime Rate vs House Price' ) plt xlabel ( ' Mean House Price' ) plt ylabel ( ' Crime Rate ( per 1 0 0 , 0 0 0 population ) ' ) plt grid ( True ) plt show ( ) plt figure ( figsize ( 1 0 , 6 ) ) plt scatter ( merged df ' population density' , merged df ' crime rate' ) plt title ( ' Crime Rate vs Population Density' ) plt xlabel ( ' Population Density ( people per square km ) ' ) plt ylabel ( ' Crime Rate ( per 1 0 0 , 0 0 0 population ) ' ) plt grid ( True ) plt show ( ) The above python code is run in google colab and it produces nil output as nan modify and correct the code and add some more correlation if it helps to achieve my output i have access to all the 4 excel sheets of data something is wrong since it keeps on producing Nan i am getting no correlation no matter how i edit so i will explain what each data sheet looks like i suspect it is not aligning with the merging in code MeanHousePricesClean 1 ( 1 ) xlsx The first row contains years and area names From the second row onwards, each row represents a year, and the values in each column represent mean house prices for the corresponding area CrimeClean 1 1 xlsx The first row contains years, local government area names, incidents recorded, and crime rate per 1 0 0 , 0 0 0 population From the second row onwards, each row represents data for a specific year and area, including incidents recorded and crime rate PopulationClean xlsx Similar to MeanHousePricesClean 1 ( 1 ) xlsx , the first row contains years and area names From the second row onwards, each row represents a year SuburbAreas 1 xlsx The first row contains property names The second row contains the corresponding area in square kilometers

The Answer is in the image, click to view ...

Question: import pandas as pd import numpy as np from scipy.stats import pearsonr import matplotlib.pyplot as plt # Read data files house _ prices _ df

import pandas as pd

import numpy as np

from scipy.stats import pearsonr

import matplotlib.pyplot as plt

# Read data files

house

_

prices

_

=

.

read

_

excel

("

MeanHousePricesClean

- 1 .

xlsx

")

crime

_

=

.

read

_

excel

("

CrimeClean

- 1 - 1 .

xlsx

")

population

_

=

.

read

_

excel

("

PopulationClean

.

xlsx

")

area

_

=

.

read

_

excel

("

SuburbAreas

- 1 .

xlsx

")

# Step B: Clean and prepare data

def prepare

_

data

(

,

columns

)

=

.

dropna

(

subset

=

columns

)

# Remove rows with missing values in key columns

return df

# Rename columns for consistency

house

_

prices

_

=

house

_

prices

_

.

rename

(

columns

= {'

Year

'

: 'year'

})

crime

_

=

crime

_

.

rename

(

columns

= {

'Year': 'year',

'Crime rate per

100, 000

population': 'crime

_

rate',

'Local Government Area': 'local

_

government

_

area'

})

population

_

=

population

_

.

rename

(

columns

= {'

Year

'

: 'year'

})

area

_

=

area

_

.

rename

(

columns

= {'

Property

'

: 'local

_

government

_

area'

})

# Clean the area DataFrame to remove non

-

relevant rows

area

_

=

area

_

[

area

_

['

local

_

government

_

area'

]! =

'Area sq Km

']

# Step C: Analysis functions

def analyze

_

correlation

(

,

col

1,

col

2)

=

.

dropna

(

subset

= [

col

1,

col

2])

if len

(

) 2

return np

.

nan

correlation,

_=

pearsonr

(

[

col

1],

[

col

2])

return correlation

# Reshape house

_

prices

_

df to long format

house

_

prices

_

long

=

house

_

prices

_

.

melt

(

_

vars

= ['

year

'],

var

_

name

=

'local

_

government

_

area', value

_

name

=

'mean

_

house

_

price'

)

# Reshape population

_

df to long format

population

_

long

=

population

_

.

melt

(

_

vars

= ['

year

'],

var

_

name

=

'local

_

government

_

area', value

_

name

=

'population'

)

# Reshape area

_

df to long format

area

_

long

=

area

_

.

melt

(

_

vars

= ['

local

_

government

_

area'

],

var

_

name

=

'year', value

_

name

=

'area'

)

# Merge the datasets on 'year' and 'local

_

government

_

area'

merged

_

=

.

merge

(

crime

_

,

house

_

prices

_

long, on

= ['

year

',

'local

_

government

_

area'

],

how

=

'inner'

)

merged

_

=

.

merge

(

merged

_

,

population

_

long, on

= ['

year

',

'local

_

government

_

area'

],

how

=

'inner'

)

merged

_

=

.

merge

(

merged

_

,

area

_

long, on

=

'local

_

government

_

area', how

=

'inner'

)

# Calculate population density

merged

_

['

population

_

density'

] =

merged

_

['

population

'] /

merged

_

['

area

']

# Step D: Prepare the data by cleaning

merged

_

=

prepare

_

data

(

merged

_

, ['

mean

_

house

_

price', 'crime

_

rate', 'population

_

density'

])

# Step E: Perform correlation analysis

house

_

price

_

population

_

corr

=

analyze

_

correlation

(

merged

_

,

'mean

_

house

_

price', 'population

_

density'

)

crime

_

house

_

price

_

corr

=

analyze

_

correlation

(

merged

_

,

'crime

_

rate', 'mean

_

house

_

price'

)

crime

_

population

_

density

_

corr

=

analyze

_

correlation

(

merged

_

,

'crime

_

rate', 'population

_

density'

)

# Step F: Print the results

(

"

Correlation between house prices and population density:

{

house

_

price

_

population

_

corr

} ")

(

"

Correlation between crime rate and house prices:

{

crime

_

house

_

price

_

corr

} ")

(

"

Correlation between crime rate and population density:

{

crime

_

population

_

density

_

corr

} ")

# Plotting for visual analysis

plt

.

figure

(

figsize

= (10, 6))

plt

.

scatter

(

merged

_

['

population

_

density'

],

merged

_

['

mean

_

house

_

price'

])

plt

.

title

('

House Price vs Population Density'

)

plt

.

xlabel

('

Population Density

(

people per square km

)')

plt

.

ylabel

('

Mean House Price'

)

plt

.

grid

(

True

)

plt

.

show

()

plt

.

figure

(

figsize

= (10, 6))

plt

.

scatter

(

merged

_

['

mean

_

house

_

price'

],

merged

_

['

crime

_

rate'

])

plt

.

title

('

Crime Rate vs House Price'

)

plt

.

xlabel

('

Mean House Price'

)

plt

.

ylabel

('

Crime Rate

(

per

100, 000

population

)')

plt

.

grid

(

True

)

plt

.

show

()

plt

.

figure

(

figsize

= (10, 6))

plt

.

scatter

(

merged

_

['

population

_

density'

],

merged

_

['

crime

_

rate'

])

plt

.

title

('

Crime Rate vs Population Density'

)

plt

.

xlabel

('

Population Density

(

people per square km

)')

plt

.

ylabel

('

Crime Rate

(

per

100, 000

population

)')

plt

.

grid

(

True

)

plt

.

show

()

The above python code is run in google colab and it produces nil output as nan

.

modify and correct the code and add some more correlation if it helps to achieve my output. i have access to all the

4

excel sheets of data. something is wrong since it keeps on producing Nan. i am getting no correlation no matter how i edit so i will explain what each data sheet looks like. i suspect it is not aligning with the merging in code. MeanHousePricesClean

- 1 (1) .

xlsx:

The first row contains years and area names.

From the second row onwards, each row represents a year, and the values in each column represent mean house prices for the corresponding area.

CrimeClean

- 1 - 1 .

xlsx:

The first row contains years, local government area names, incidents recorded, and crime rate per

100, 000

population.

From the second row onwards, each row represents data for a specific year and area, including incidents recorded and crime rate.

PopulationClean.xlsx:

Similar to MeanHousePricesClean

- 1 (1) .

xlsx

,

the first row contains years and area names.

From the second row onwards, each row represents a year

SuburbAreas

- 1 .

xlsx:

The first row contains property names.

The second row contains the corresponding area in square kilometers

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

import pandas as pd import numpy as np from scipy.stats import pearsonr import matplotlib.pyplot as plt # Read data files house _ prices _ df = pd . read _ excel ( " MeanHousePricesClean - 1 . xlsx "...

The code runs free of errors but always gives Nan correlation. i dont understand what the problem is , i tried every solution but keeps producing same output. i have attached the 4 datasets i have to...

Run the codes below and give a brief discussion of the results: A) import pandas as pd import numpy as np from sklearn.cluster import AgglomerativeClustering import matplotlib.pyplot as plt # Load...

My output is opposite of correct one. Please help me with this problem in Python 3 please Bottom part is my code from pandas import Series, DataFrame import random as rn import pandas as pd import...

Download and Prep the Data: 3 Marks Import the libraries needed import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns Load data and store in dataframe df = pd ....

I'm stuck on coding in python colab. For this code I would have to clean up the data as the expected value is 1.5 but it also needs the numbers above 1.458. this would also be along with the number...

This code block works with data.all-data but I don't understand why does not accept data.csv file. I don't know how to use phyton well. I would be glad if you could help. Code Block: # -*- coding:...

Answer using sci-kit learn here is the dataset https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings In this part, you will work with scikit-learn, an industry standard package for machine...

a) Explain the significance of using relevant costing techniques in decision making (6 marks) b) Define the following costs and give an example for each: 1. sunk costs 2. variable costs 3....

Why are there ethical dilemmas? I thought accountants had standards that specified what ethical behavior is. Discuss this quote.

what method does IFRS require companies to use for reporting changes in accounting policies?

PLEASE ANSWER IN EXCEL FORM WITH EXACT CELL FORMULAS USED TO SOLVE EACH OF THE YELLOW BOXES. THIS IS MY THIRD TIME POSTING THIS QUESTION. 3 Quad Enterprises is considering a new three-year expansion...