Question: Question 4 ( k - means, 4 0 / 1 0 0 ) Using the same terminology adopted in our course, we shall refer to

Question 4(k-means, 40/100) Using the same terminology adopted in our course, we shall
refer to the "k-means" algorithm as the algorithm that initializes the centroids randomly,
followed by a "refinement phase" where the clusters are improved further. We shall refer to
"k-means++" as the algorithm that only selects the centroids, so as to provide some
theoretical guarantees. Consider the following points in the 2D Euclidean space: p1=(0,0),
p2=(0,1),p3=(0,2),p4=(2,0),p5=(3,0),p6=(4,1),p7=(5,0),p8=(7,0),p9=(8,0),p10=(8,1). Let
k=3.
a)(10/100) Run the k-means++ algorithm to select the initial centroids, assuming that 1)
p4 is selected as first centroid 2) the remaining two centroids are chosen assuming
that at each step the point with 3rd largest probability is selected by k-means++
(breaking ties arbitrarily). In other words, let q1,q2,q3,..,qn be the input points sorted
non-increasingly according to their probability of being selected at a given step of k-
means++. Then, the point q3 is going to be selected at that step. Which centroids
have been selected at the end of this initialization step?
b)(10/100) What is the probability that p5 is selected as centroid at step 3 of k-
means++?
c)(10/100) Run the refinement phase of the k-means algorithm until it terminates while
using the centroids selected in a). What are the final clusters?
d)(10/100) Consider the variant of the k-means algorithm, where 1) any given point p
can be moved from cluster C with centroid c to a cluster C' with centroid c' even if
{:d(p,c)=d(p,c'),2) we stop as soon as we obtain the same clustering in two
consecutive iterations. Show that this variant of k-means might never terminate by
providing an example with at most 5 points in the 1-dimensional Euclidean space:Question 4(k-means, 40/100) Using the same terminology adopted in our course, we shall
refer to the "k-means" algorithm as the algorithm that initializes the centroids randomly,
followed by a "refinement phase" where the clusters are improved further. We shall refer to
"k-means++" as the algorithm that only selects the centroids, so as to provide some
theoretical guarantees. Consider the following points in the 2D Euclidean space: p1=(0,0),
p2=(0,1),p3=(0,2),p4=(2,0),p5=(3,0),p6=(4,1),p7=(5,0),p8=(7,0),p9=(8,0),p10=(8,1). Let
k=3.
a)(10/100) Run the k-means++ algorithm to select the initial centroids, assuming that 1)
p4 is selected as first centroid 2) the remaining two centroids are chosen assuming
that at each step the point with 3rd largest probability is selected by k-means++
(breaking ties arbitrarily). In other words, let q1,q2,q3,..,qn be the input points sorted
non-increasingly according to their probability of being selected at a given step of k-
means++. Then, the point q3 is going to be selected at that step. Which centroids
have been selected at the end of this initialization step?
b)(10/100) What is the probability that p5 is selected as centroid at step 3 of k-
means++?
c)(10/100) Run the refinement phase of the k-means algorithm until it terminates while
using the centroids selected in a). What are the final clusters?
d)(10/100) Consider the variant of the k-means algorithm, where 1) any given point p
can be moved from cluster C with centroid c to a cluster C' with centroid c' even if
{:d(p,c)=d(p,c'),2) we stop as soon as we obtain the same clustering in two
consecutive iterations. Show that this variant of k-means might never terminate by
providing an example with at most 5 points in the 1-dimensional Euclidean space:
 Question 4(k-means, 40/100) Using the same terminology adopted in our course,

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!