Question: Question1: # partition the data into training (60%) and validation (40%) sets predictors = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO',
Question1:
# partition the data into training (60%) and validation (40%) sets predictors = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'LSTAT'] outcome = 'MEDV'
# partition the data #Create a dataframe called X with the columns in the predictors[] list above # Make sure to turn text columns (categorical) values into dummy variable columns #MISSING 1 line of code
#Create a dataframe (technically a Series) called y containing the outcome column #MISSING 1 line of code
#Split the data into 40/60 validation and training datasets with a random state of 1 #MISSING 1 line of code
print('Training set:', train_X.shape, 'Validation set:', valid_X.shape)
output: Training set: (303, 12) Validation set: (203, 12)
Question2:
# backward elimination
def train_model(variables): model = LinearRegression() model.fit(train_X[variables], train_y) return model
def score_model(model, variables): return AIC_score(train_y, model.predict(train_X[variables]), model)
#Run the backward_elimination function #MISSING 1 line of code
print("Best Subset:", best_variables)
output:
Variables: CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, LSTAT Start: score=1807.23 Step: score=1805.30, remove AGE Step: score=1803.57, remove INDUS Step: score=1803.57, remove None Best Subset: ['CRIM', 'ZN', 'CHAS', 'NOX', 'RM', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'LSTAT'] Question3:
# forward selection # The initial model is the constant model - this requires special handling in train_model and score_model
#Write the train_model function (starting with "def") #MISSING 6 lines of code def .......
#Write the score_model function (starting with "def") #MISSING 4 lines of code def .....
#Run the forward_selection function #MISSING 1 line of code
print("Best Subset:", best_variables)
output:
Variables: CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, LSTAT Start: score=2191.75, constant Step: score=1934.91, add LSTAT Step: score=1874.18, add RM Step: score=1842.54, add PTRATIO Step: score=1837.69, add CHAS Step: score=1835.00, add NOX Step: score=1817.90, add DIS Step: score=1811.82, add ZN Step: score=1810.16, add CRIM Step: score=1808.01, add RAD Step: score=1803.57, add TAX Step: score=1803.57, add None Best Subset: ['LSTAT', 'RM', 'PTRATIO', 'CHAS', 'NOX', 'DIS', 'ZN', 'CRIM', 'RAD', 'TAX'] Question 4:
# stepwise (both) method
#Run the stepwise_selection function #MISSING 1 line of code
print("Best Subset:", best_variables)
output:
Variables: CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, LSTAT Start: score=2191.75, constant Step: score=1934.91, add LSTAT Step: score=1874.18, add RM Step: score=1842.54, add PTRATIO Step: score=1837.69, add CHAS Step: score=1835.00, add NOX Step: score=1817.90, add DIS Step: score=1811.82, add ZN Step: score=1810.16, add CRIM Step: score=1808.01, add RAD Step: score=1803.57, add TAX Step: score=1803.57, unchanged None Best Subset: ['LSTAT', 'RM', 'PTRATIO', 'CHAS', 'NOX', 'DIS', 'ZN', 'CRIM', 'RAD', 'TAX']Question 5:
# Re-run the Regression but this time fit the model with best subset variables from the # subset reductions from above
#Define the outcome and predictor variables outcome = 'MEDV' predictors = ['LSTAT', 'RM', 'PTRATIO', 'CHAS', 'NOX', 'DIS', 'ZN', 'CRIM', 'RAD', 'TAX']
#Create a dataframe called X containing the new predictor columns #MISSING 1 line of code
#Create a dataframe (Series) called y containing the outcome column. #MISSING 1 line of code
# fit the regression model y on X #MISSING 2 lines of code
# print the intercept #MISSING 1 line of code
#print the predictor column names and the coefficients #MISSING 1 line of code
# print performance measures (training set) print(" Model performance on training data:") #MISSING 1 line of code
# predict prices in validation set, print first few predicted/actual values and residuals #MISSING 1 line of code
result = pd.DataFrame({'Predicted': house_lm_pred, 'Actual': valid_y, 'Residual': valid_y - house_lm_pred})
# print performance measures (validation set) print(" Model performance on validation data:") #MISSING 1 line of code
output:
intercept 38.95615649828231 Predictor coefficient 0 LSTAT -0.514444 1 RM 3.480964 2 PTRATIO -0.804964 3 CHAS 2.359986 4 NOX -17.866926 5 DIS -1.438596 6 ZN 0.066221 7 CRIM -0.114137 8 RAD 0.262455 9 TAX -0.011166
Model performance on training data:
Regression statistics
Mean Error (ME) : -0.0000 Root Mean Squared Error (RMSE) : 4.5615 Mean Absolute Error (MAE) : 3.1662 Mean Percentage Error (MPE) : -3.4181 Mean Absolute Percentage Error (MAPE) : 16.4898
Model performance on validation data:
Regression statistics
Mean Error (ME) : -0.0393 Root Mean Squared Error (RMSE) : 5.0771 Mean Absolute Error (MAE) : 3.5746 Mean Percentage Error (MPE) : -5.1561 Mean Absolute Percentage Error (MAPE) : 16.9733
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
