Question: Initialization is an important part of the k-means algorithm. Assume we have a dataset of 5 points, as shown in Figure 2: {(2,1),(1,1),(1,5),(1,4),(2,4)}. Suppose we

Initialization is an important part of the k-means algorithm. Assume we

Initialization is an important part of the k-means algorithm. Assume we have a dataset of 5 points, as shown in Figure 2: {(2,1),(1,1),(1,5),(1,4),(2,4)}. Suppose we are running k-means with two clusters. (a) Let the initialization be the points (2,1) and (1,5). What are the coordinates of the cluster centers at convergence? What are the loss of k-means at convergence? (b) Background: In class, we briefly saw that kmeans++ is an algorithm with an improved strategy for choosing the initial cluster centers. It proceeds as follows. First, let X be the set of data points, and let C={ci}i be a set of cluster centers. Let D(x) be the shortest distance from a point xX to any center in C : D(x)=mincCxc Then: i. Take one center c1, chosen uniformly at random from the set of points X. ii. Take a new center ci, choosing xX with probability yXD(y)2D(x)2. iii. Repeat step ii until we have chosen k initial centers. iv. Proceed as with the standard k-means algorithm. See the linked kmeans++ paper for more details. Question: Now we want to have 3 centers (k=3). Following the kmeans++ initialization procedure outlined above and given the first and the second center c1=(2,1),c2=(1,5), compute the probabilities of choosing each of the remaining points as the third center

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

write a program, take the score data from the class1 table of the "score. xls" file (part of the data is shown in Figure 1), use the KMeans algorithm to build a model, and output the clustering...

1. What is a total addressable memory space on a 32-bit machine? Why? 2. Consider a logical address space of 64 pages of 1,024words each,mapped onto a physical memory of 32 frames. How many bits are...

Please provide the answer for part e. 3 Logistic Regression Assume we have two features 2 = (1,22). Each sample is ( 2 1 or 0. 1 ) with associated label y) that is either (a) Decision boundary. (5...

Chapter 16. Transformation of Marketing at the Ohio Art Company (A) OHIO ART COMPANY HISTORY The Ohio Art Companyamong Americas oldest toymakerswas headquartered in Bryan, Ohio, a small town in the...

Homework 5 Progress saved Score: 7/15 7/15 answered 0 Question 10 v [3 0/1 Given that the point (-6, 8) is on the terminal side of an angle, 0, find the exact value of the following: simm Homework 5...

the bottom of the main loop (after getting user input), increment the current player. Then, if the number is too high, reset it to 0. Before printing whose turn it is, print the board using one of...

Let i and j be positive integers. (i) Prove that there exist natural numbers a and b such that ai = bj+gcd(i, j). You may use standard results provided that you state them clearly. [4 marks] (ii) Let...

The following figures illustrate a step in the application of the Classification Tree method to the Riding Mowers case study ( 2 4 total observations ) . The first three splits are shown on the...

Question 1 Apply those ratios to analyze Google financial position and provide clear interpretation on each ratio. a. Current Ratio b. Quick Ratio c. Days Accounts Receivable d. Day Inventory e. Days...

Complete this case, available online in the Connect library. By completing this case, you will learn to use a spreadsheet to capture transactions and use cell linking to prepare an end- of- period...

To calculate the Yield to Maturity ( YTM ) on Zombie Company's bonds, we can use the bond pricing formula and solve for the interest rate ( YTM ) . Since the bond pays semiannual interest, we'll...

On September 30, 2017, Ericson Company negotiated a two-year, 1,000,000 dudek loan from a foreign bank at an interest rate of 2 percent per year. It makes interest payments annually on September 30...

2. Identify issues/causes for the apparent conflict.

1. Should board members obtain contracts or donations for their own organizations?

4. If the situation merits the conflict, what would need to change to avoid the conflict?