Question: Implement an iterative algorithm ( k - means ) in Spark to calculate k - means for a set of points that are in a
Implement an iterative algorithm kmeans in Spark to calculate kmeans for
a set of points that are in a file, a kmeans algorithm in python. Do not use use Kmeans in MLib of Spark to solve the problem. Set the center points to k
Follow this pattern:
Randomly assign a centroid to each of the k clusters k
Calculate the distance of all observation to each of the k centroids
Assign observations to the closest centroid
Find the new location of the centroid by taking the mean of all the observations in each cluster
Repeat steps until the centroids do not change position
Note: You need a variable to decide when the Kmeans calculation is done when
the amount the locations of the means changes between iterations is less than the variable. Set
the variable
Example of imput file an rdd:
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
