Question: Question1: # partition the data into training (60%) and validation (40%) sets predictors = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO',

Question1:

# partition the data into training (60%) and validation (40%) sets predictors = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'LSTAT'] outcome = 'MEDV'

# partition the data #Create a dataframe called X with the columns in the predictors[] list above # Make sure to turn text columns (categorical) values into dummy variable columns #MISSING 1 line of code

#Create a dataframe (technically a Series) called y containing the outcome column #MISSING 1 line of code

#Split the data into 40/60 validation and training datasets with a random state of 1 #MISSING 1 line of code

print('Training set:', train_X.shape, 'Validation set:', valid_X.shape)

output: Training set: (303, 12) Validation set: (203, 12)

Question2:

# backward elimination

def train_model(variables): model = LinearRegression() model.fit(train_X[variables], train_y) return model

def score_model(model, variables): return AIC_score(train_y, model.predict(train_X[variables]), model)

#Run the backward_elimination function #MISSING 1 line of code

print("Best Subset:", best_variables)

output:

Variables: CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, LSTAT Start: score=1807.23 Step: score=1805.30, remove AGE Step: score=1803.57, remove INDUS Step: score=1803.57, remove None Best Subset: ['CRIM', 'ZN', 'CHAS', 'NOX', 'RM', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'LSTAT'] Question3:

# forward selection # The initial model is the constant model - this requires special handling in train_model and score_model

#Write the train_model function (starting with "def") #MISSING 6 lines of code def .......

#Write the score_model function (starting with "def") #MISSING 4 lines of code def .....

#Run the forward_selection function #MISSING 1 line of code

print("Best Subset:", best_variables)

output:

Variables: CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, LSTAT Start: score=2191.75, constant Step: score=1934.91, add LSTAT Step: score=1874.18, add RM Step: score=1842.54, add PTRATIO Step: score=1837.69, add CHAS Step: score=1835.00, add NOX Step: score=1817.90, add DIS Step: score=1811.82, add ZN Step: score=1810.16, add CRIM Step: score=1808.01, add RAD Step: score=1803.57, add TAX Step: score=1803.57, add None Best Subset: ['LSTAT', 'RM', 'PTRATIO', 'CHAS', 'NOX', 'DIS', 'ZN', 'CRIM', 'RAD', 'TAX'] Question 4:

# stepwise (both) method

#Run the stepwise_selection function #MISSING 1 line of code

print("Best Subset:", best_variables)

output:

Variables: CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, LSTAT Start: score=2191.75, constant Step: score=1934.91, add LSTAT Step: score=1874.18, add RM Step: score=1842.54, add PTRATIO Step: score=1837.69, add CHAS Step: score=1835.00, add NOX Step: score=1817.90, add DIS Step: score=1811.82, add ZN Step: score=1810.16, add CRIM Step: score=1808.01, add RAD Step: score=1803.57, add TAX Step: score=1803.57, unchanged None Best Subset: ['LSTAT', 'RM', 'PTRATIO', 'CHAS', 'NOX', 'DIS', 'ZN', 'CRIM', 'RAD', 'TAX']Question 5:

# Re-run the Regression but this time fit the model with best subset variables from the # subset reductions from above

#Define the outcome and predictor variables outcome = 'MEDV' predictors = ['LSTAT', 'RM', 'PTRATIO', 'CHAS', 'NOX', 'DIS', 'ZN', 'CRIM', 'RAD', 'TAX']

#Create a dataframe called X containing the new predictor columns #MISSING 1 line of code

#Create a dataframe (Series) called y containing the outcome column. #MISSING 1 line of code

# fit the regression model y on X #MISSING 2 lines of code

# print the intercept #MISSING 1 line of code

#print the predictor column names and the coefficients #MISSING 1 line of code

# print performance measures (training set) print(" Model performance on training data:") #MISSING 1 line of code

# predict prices in validation set, print first few predicted/actual values and residuals #MISSING 1 line of code

result = pd.DataFrame({'Predicted': house_lm_pred, 'Actual': valid_y, 'Residual': valid_y - house_lm_pred})

# print performance measures (validation set) print(" Model performance on validation data:") #MISSING 1 line of code

output:

intercept 38.95615649828231 Predictor coefficient 0 LSTAT -0.514444 1 RM 3.480964 2 PTRATIO -0.804964 3 CHAS 2.359986 4 NOX -17.866926 5 DIS -1.438596 6 ZN 0.066221 7 CRIM -0.114137 8 RAD 0.262455 9 TAX -0.011166

Model performance on training data:

Regression statistics

Mean Error (ME) : -0.0000 Root Mean Squared Error (RMSE) : 4.5615 Mean Absolute Error (MAE) : 3.1662 Mean Percentage Error (MPE) : -3.4181 Mean Absolute Percentage Error (MAPE) : 16.4898

Model performance on validation data:

Regression statistics

Mean Error (ME) : -0.0393 Root Mean Squared Error (RMSE) : 5.0771 Mean Absolute Error (MAE) : 3.5746 Mean Percentage Error (MPE) : -5.1561 Mean Absolute Percentage Error (MAPE) : 16.9733

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!