Question: Assume we have a 10-variable regression problem with a training data set and testing data set. We run the following three regression methods on the

Assume we have a 10-variable regression problem with a training data set and testing data set. We run the following three regression methods on the training data: best subsets forward selection (forward stepwise) backward elimination (backward stepwise) For each method we keep the chosen models with 5 and 7 variables (total of 6 models).

(a) Which 5-variable model will have the lowest Residual Sum of Squares (RSS) for the training data? Briefly explain your answer.

(b) Which 5-variable model will have the lowest prediction RSS, ie the lowest RSS when using the model fit using training data to predict for the testing data? Briefly explain your answer.

(c) For which of the methods are we guaranteed that the 5-variables in the 5-variable model are a subset of the 7-variables in the 7-variable model?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!