Question: Implement an iterative algorithm ( k - means ) in Spark to calculate k - means for a set of points that are in a

Implement an iterative algorithm (k-means) in Spark to calculate k-means for
a set of points that are in a file, a k-means algorithm in python. Do not use use K-means in MLib of Spark to solve the problem. Set the center points to k=5.
Follow this pattern:
Randomly assign a centroid to each of the k clusters (k =5).
Calculate the distance of all observation to each of the k centroids
Assign observations to the closest centroid
Find the new location of the centroid by taking the mean of all the observations in each cluster
Repeat steps 3-5 until the centroids do not change position
Note: You need a variable to decide when the K-means calculation is done when
the amount the locations of the means changes between iterations is less than the variable. Set
the variable =0.1.
Example of imput file (an rdd):
[(7869,8696),(8676,-4746),(9484,112526),(-1827,5958),(987,900087),(18127,9383),(298,272),(91716,2827),(12625,92827)........]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!