Question: Please Help You are required to implement a simple linear regression from scratch. Before start, we import packages that will be used in the project:
Please Help
You are required to implement a simple linear regression from scratch.
Before start, we import packages that will be used in the project:
In [3]:
import numpy as np
import numpy.linalg as la
import pandas as pd
Now we imput a dataset. We use "slr04.csv" in this program. This dataset includes national unemployment rate for adult males and females. In this problem, we use the unemployement rate for adult males to predict the unemployement rate for adule females.
"slr04.csv" Dataset
X Y
X=national unemployment rate for adult males
Y=national unemployment rate for adult females
2.9 4
6.7 7.4
4.9 5
7.9 7.2
9.8 7.9
6.9 6.1
6.1 6
6.2 5.8
6 5.2
5.1 4.2
4.7 4
4.4 4.4
5.8 5.2
We need to convert the Pandas dataframe to Numpy array:
In [ ]:
filename = "slr04.csv"
df = pd.read_csv(filename, skiprows=[1,2])
df.head()
In [ ]:
X = df['X'].values
Y = df['Y'].values
print(X[:5], Y[:5]) #show the first few lines.
You are required to finish the following code to calculate the covariance of two vectors X and Y. You may need to use .mean() member function to calcluate X and Y. You also need to calculate the inner product of two vectors.
In [ ]:
def s_xy(X, Y):
""" This function calculates the covarance of X and Y."""
n = X.shape[0]
#Begin your code here
Xbar = 0.0
Ybar = 0.0
result = 0.0
# END code
return result
In [ ]:
print(s_xy(X,X))
print(s_xy(X,Y))
If your code is correct, the above code should generate outputs closing to: 2.9410257342487385 and 2.0426282316794815
Note that 2sx2 is virtually sxx. So we do not have to define 2sx2 separately.
Now we can calculate 1b1 and 0b0.
In [ ]:
def linear_reg(X, Y):
n = X.shape[0]
sxy = s_xy(X,Y)
sx2 = s_xy(X,X)
sy2 = s_xy(Y,Y)
#Begin your code here:
b1 = 0.0
b0 = 0.0
r=0.0
#END Your Code
return b0, b1, r
In [ ]:
print(linear_reg(X,Y))
The above code should generate result closing to: (1.4341107664655013, 0.6945291936390535, 0.9062646842045528)
Note that we can also use scipy.stats.lineargress package to do the linear regression. You can compare results.
In [ ]:
import scipy.stats as stats
In [ ]:
s, i, r, p, stderr = stats.linregress(X,Y)
print(s,i,r,p,stderr)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
