Question: 2. For this question, suppose you have a larger version of the Weather dataset from Question 1, which contains approximately one years worth of data.
2. For this question, suppose you have a larger version of the Weather dataset from Question 1, which contains approximately one years worth of data. The goal of this question will be to evaluate the performance of the nearest neighbor classifier and K-nearest neighbor classifiers discussed in Question 1 in the context of this version of the Weather problem. (a) Briefly explain (in 2-3 sentences, your own words) the concept of overfitting. (1) (b) Briefly explain (in 2-3 sentences, your own words) why it would not be appropriate to evaluate the performance of the nearest neighbor classifiers by simply measuring the accuracy of the models on the training dataset. Be sure to use the concept of overfitting in your answer. (1) (c) Briefly explain (in 2-3 sentences, your own words) the concept of a hold-out set, and how it can help to address the problem you described in your answer to Question 2b. (1) (d) Suppose we train a K-nearest neighbor classifier on the training data for the Weather problem with different several different values of the number of nearest neighbors, K = {1, 3, 5, 7, 9, 11}. We select the best value of K which performs the best on the hold-out set. Going forward, we plan to use the model with the best value of K to predict the class labels (Play) on each new day from now on. Briefly explain (in 2-3 sentences, your own words) why the accuracy of the model, estimated based on the hold-out set, may be an over-estimate of the models true accuracy on future days. (1) (e) Briefly explain (in 2-3 sentences, your own words), the concept of a validation set, and how it can help to address the problem you described in your answer to Question 2d. (1) (f) Briefly explain (in 2-3 sentences, your own words) a possible use-case/application of clustering methods such as K-means to the Weather problem. In your answer, be sure to explain what the output of the clustering method would be used for. Feel free to make any assumptions you want to about the broader context of the scenario and who is using the model (e.g. which particular sport is being played and whether it is professional or amateur), though it would be best to explain any assumptions you make.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
