Question: Problem Statement: k-nearest neighbor classification for the Iris data set. The Iris data set has three species: Setosa, Versicolor, and Virginica. Each species has 50
Problem Statement: k-nearest neighbor classification for the Iris data set.
The Iris data set has three species: Setosa, Versicolor, and Virginica. Each species has 50 data points. We can call each species a class. Consider the first 40 data points of each class as training samples and the remaining 10 data points of each species as test/target data points.
Use k-Nearest Neighbor (k-NN) algorithm to classify those test/target data points into proper species/classes.
Consider using different values for k (1, 3, 9).
For each value of k, consider different distance metric () with the following general distance measure or norm:
where are two data points, d is the number of dimensions or features of each data point.
Submit:
RStudio code of your solution.
Fill out the following table with the number of incorrectly classified data points in each scenario (k, n, genuine species of 10 test data points).
| Genuine species of 10 test data points | = (1, 1) | = (1, 2) | = (1, infinity) | = (3, 1) | = (3, 2) | = (3, infinity) | = (9, 1) | = (9, 2) | = (9, infinity) |
| Setosa |
|
|
|
|
|
|
|
|
|
| Versicolor |
|
|
|
|
|
|
|
|
|
| Virginica |
|
|
|
|
|
|
|
|
|
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
