Question: using rstudio 4. Using the Boston data set of library MASS, fit classification models in order to predict whether a given suburb has a house

 using rstudio 4. Using the Boston data set of library MASS,

using rstudio

4. Using the Boston data set of library MASS, fit classification models in order to predict whether a given suburb has a house value (medu) above or below the median (median(medu)). In particular, explore various KNN models. (a) Create a medv01 variable, where __fo, medu Smedian(medv), 1, medu > median(medu) Add it as a column in Boston data frame, while disposing of the original medu variable (via Boston$medu = NULL). (b) Decide on three values of K and three subsets of predictors including full set of all 13 predictors). Totally your judgment call. When picking predictor subsets, you could use Your own logical considerations (which variables appear most important to predict the price), Correlation matrix (to determine if there are groups of correlated variables, and you could just retain one of them in the model, dropping the rest). (c) What stable method was introduced in the class in order to compare predictive quality of different models? Proceed to use this method & obtain test error estimates of all 3 x 3 = 9 models. Which model (the combination of K & predictor subset) won? (d) Are all the variables in Boston data set on the same scale? If not, how do we deal with it? (e) Proceed to apply scaling to the predictors in Boston data, and repeat part (c) for the standardized data set. Did the results (the test errors & the winning model) change? 4. Using the Boston data set of library MASS, fit classification models in order to predict whether a given suburb has a house value (medu) above or below the median (median(medu)). In particular, explore various KNN models. (a) Create a medv01 variable, where __fo, medu Smedian(medv), 1, medu > median(medu) Add it as a column in Boston data frame, while disposing of the original medu variable (via Boston$medu = NULL). (b) Decide on three values of K and three subsets of predictors including full set of all 13 predictors). Totally your judgment call. When picking predictor subsets, you could use Your own logical considerations (which variables appear most important to predict the price), Correlation matrix (to determine if there are groups of correlated variables, and you could just retain one of them in the model, dropping the rest). (c) What stable method was introduced in the class in order to compare predictive quality of different models? Proceed to use this method & obtain test error estimates of all 3 x 3 = 9 models. Which model (the combination of K & predictor subset) won? (d) Are all the variables in Boston data set on the same scale? If not, how do we deal with it? (e) Proceed to apply scaling to the predictors in Boston data, and repeat part (c) for the standardized data set. Did the results (the test errors & the winning model) change

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!