Question: Prove that k-means will produce k clusters, all non empty, or: Give an example of a set D of data points (with no repeated data

Prove that k-means will produce k clusters, all non empty, or: Give an example of a set D of data points (with no repeated data point), a value for k (k<=n, where n is the number of data objects), and a set of k data points as initial seeds, such that some cluster becomes empty. A motivation for this problem: Many or most of you will become professional programmers. The programs you write are supposed to work all the time, not just 999 times out of 1000. The pseudocode for k-means in the textbook will fail if indeed one of the clusters becomes empty.

Question: is it safe to write the code for k-means as the textbook, or will code written like that get you into some trouble with your manager (not initially, but after a while)? If you decide to try to find a malevolent data you can try the following. Try a data file of 6 to 12 data points in the plane so that when k-means is performed to k = 3 or k = 4 clusters, in iteration 2 (or iteration 3 or ...), one of the clusters becomes empty. The initial seeds must be data points; and your example should not rely on ties to accomplish its goals. (Comment: this effort might be easier if you let most of the points be collinear.) Show the step-by-step process of the clustering. Finally, Propose a solution in case of an empty cluster.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!