Question: Suppose you are working on developing a classification procedure, where you have 1,200 candidate predictors (features) and only 200 observations for the class labels. To
Suppose you are working on developing a classification procedure, where you have 1,200 candidate predictors (features) and only 200 observations for the class labels. To reduce the number of candidate predictors, and focus on the more promising subset of them, you select the 200 of them having the largest absolute value of their correlation with the observed class labels. Then you fit various classification models using the subset of highly correlated predictors, and you would like to select the one model, which is expected to perform the best on a test sample. You decide to use 10-fold cross validation to estimate the test set performance of the candidate models on the subset of highly correlated predictors.Do you expect these 10-fold cross-validation estimates to be valid? Explain your answer.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
