Question: 1 . R programming: The first task is to write the code to implement the K - Nearest Neighbors, or KNN , model from scratch.

1.R programming: The first task is to write the code to implement the K-Nearest Neighbors, or KNN, model from
scratch. We will do this in steps:
Write a function called euclidean_distance that calculates the Euclidean distance between two vectors.
There are two input arguments for this function: vector 1(vec1), and vector 2(vec2). The output for
this function is a numeric, the Euclidean distance (euclDist).
Write a function called manhattan_distance that calculates the Manhattan distance between two
vectors. There are two input arguments for this function: vector 1(vec1), and vector 2(vec2). The
output for this function is a numeric, the Manhattan distance (manhDist).
Write a function called euclidean_distance_all that calculates the Euclidean distance between a
vector and all the row vectors in an input data matrix. There are two input arguments for this
function: a vector (vec1) and an input data matrix (mat1_X). The output for this function is a vector
(output_euclDistVec) which is of the same length as the number of rows in mat1_X. This function
must use the function euclidean_distance you previously wrote.
Write a function called manhattan_distance_all that calculates the Manhattan distance between
a vector and all the row vectors in an input data matrix. There are two input arguments for this
function: a vector (vec1) and an input data matrix (mat1_X). The output for this function is a vector
(output_manhattanDistVec) which is of the same length as the number of rows in mat1_X. This function
must use the function manhattan_distance you previously wrote.
Write a function called my_KNN that compares a vector to a matrix and finds its K-nearest neighbors.
There are five input arguments for this function: vector 1(vec1), the input data matrix (mat1_X), the
class labels corresponding to each row of the matrix (mat1_Y), the number of nearest neighbors you are
interested in finding (K), and a Boolean argument specifying if we are using the Euclidean distance
(euclDistUsed). The argument K should be a positive integer. If the argument euclDistUsed = TRUE,
then use the Euclidean distance. Otherwise, use the Manhattan distance. The output of this function
is a list of length 2(output_knnMajorityVote). The first element in the output list should be a vector
of length K containing the class labels of the closest neighbors. The second element in the output list
should be the majority vote of the K class labels in the first element of the list. The function must use
the functions euclidean_distance and manhattan_distance you previously wrote.
Apply this function to predict the label of the 123rd observation using the first 100 observations as your input
training data matrix. Use K =10. What is the predicted label when you use Euclidean distance? What is
the predicted label when you use Manhattan distance? Are these predictions correct

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!