Question: Based on the data we've collected, we would like to conduct a regression analysis and make a prediction on the Median Market Value of the

Based on the data we've collected, we would like to conduct a regression analysis and make a prediction on the Median Market Value of the houses

In [18]:

 
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import warnings
warnings.filterwarnings('ignore') 

In [19]:

import scipy.stats as stats
from statsmodels.stats.outliers_influence import variance_inflation_factor
import datetime

In [20]:

 
boston_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork/labs/boston_housing.csv'
boston_df = pd.read_csv(boston_url)
boston_df=boston_df.drop(['Unnamed: 0'],axis=1)

Question 1: Buld the Regression model

1. Split the data into train (80%) and test (20%) set Set the random seed = 2600and show the shape of two sets (5 points)

2. Define the dependent variable and independent variables variables in train set (5 points)

3. Get the VIF of the independent variables (10 points)

4. From the VIF output, can you tell if there is multicollinearity problem among predictors? If yes, what predictors are involved? (5 points)

5. Build up the regression model with train set (10 points)

6. Which predictor is the most insignificant one? (5 points)

7. What is the impact of an additional weighted distance to the five Boston employment centres on the median market value of owner occupied homes? (10 points)

Question 2: Residual Analysis

1. Create the residual plot using train set (4 plots in 1) (10 points)

2. Drop all insignficant predictors at level of 0.05 from the full model and build a reduced regression model (10 points)

Question 3: Model Performance Evaluation

1. Predict the value of Y in test set using full and reduced model (10 points)

2. Compute the RMSE of full and reducted model using test set (10 points)

3. Based on the Adjusted R squared and RMSE, which model has better performance (5 points)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!