Question: Undecided as the target (or response) variable, construct a series of k-nearest neighbor classifiers as directed by the following parts (a), (b), and (c). Evaluate
Undecided as the target (or response) variable, construct a series of k-nearest neighbor classifiers as directed by the following parts (a), (b), and (c). Evaluate a range of values of k and standardize the input variables to adjust for the different magnitudes of the variables. a. Use only the continuous variables (Age, HouseholdSize, Income, and Education) as input variables. For a default cutoff value of 0.5, what value of k minimizes the overall error rate on a static validation set or through a 10-fold cross-validation procedure? b. Use all eight variables as input variables. For a default cutoff value of 0.5, what value of k minimizes the overall rate on a static validation set or through a 10-fold cross-validation procedure? c. Generally, caution is recommended when combining continuous input variables and categorical input variables as the concept of distance differs for these two types of variables. Compare the overall error rates of models from parts (a) and (b). For these data, does combining variable types degrade performance?
please answer in excel and show all work.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
