Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max algorithm in Information Theory, aims at assigning K centroids to a set of data. The centroids are points (or vectors) in Re, for a given dimension . The set of data, represented as a cloud of points, has N points in R. Figure 2 shows a dataset {n}_1 of N = 2000 points and K = 10 centroids {ck}_ The centroids are also called means because each centroid is the mean of its neighboring points. In Figure 2, the whole dataset is grouped into a unique cluster. In this case, the K centroids serve for vector quantization, i.e. approximate the shape of the entire cloud such that the mean squared error (MSE) is minimized. The MSE is (3) where c() is the centroid of the Voronoi cell (local neighborhood or local cluster) to which I belongs. After a random initialization of the K centroids, the K-means algo- rithm proceeds in two steps: Update the Voronoi Cells. c(xn) = Ck(n) n = 1... N. (4) See (20.3) in Chap. 20 of Sir David MacKay's book. Now, for each point n, we know that it belongs to the cell (n) with centroid c(xn) = C(n). (xn) is the closest centroid to In as given by (4). Let R(k) be the number of points in cell k, we have k_ R(k) = N. k=1 = MSE = Update the Centroids. argminen - Ck || closest centroid to In, - k=1 ||n C(xn)|| N : c(en)=ck R(K) Xn sum of all points in cell k number of points in cell k' Ck = (5) See (20.5) in Chap. 20 of Sir David MacKay's book. The algorithm iterates over the two steps until the MSE stops decreasing or until reaching a maximum number of iterations. In some cases, as shown in Figure 3, the dataset is already organized in clusters. The K-means algorithm is used to find the best positions of the K centroids. Then, given a new point, you can use its distance to the centroids to infer the cluster (i.e. the Voronoi cell) this new point belongs to. In this case, the centroids serve for classification. k = 1... K. (a) The MSE versus the iteration number is shown in Figure 4 for the dataset of N = 2000 blue points of Figure 2. How many iterations are needed for the K-means algorithm to reach convergence? Hint: Just follow the plot of Figure 4. (b) The initialization of the K centroids can be done in two different methods: 1) Select K random points in the space R, 2) Select K random points among the N points of the dataset. Which method is better? Explain. (c) Assume that the dataset is organized in three clusters, before running the K-means algorithm, as in Figure 3. Running the K-means with K = 3 centroids should lead to one centroid per cluster as shown in Figure 3. What happens if we run the K-means algorithm with K = 4 centroids or more? Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max algorithm in Information Theory, aims at assigning K centroids to a set of data. The centroids are points (or vectors) in Re, for a given dimension . The set of data, represented as a cloud of points, has N points in R. Figure 2 shows a dataset {n}_1 of N = 2000 points and K = 10 centroids {ck}_ The centroids are also called means because each centroid is the mean of its neighboring points. In Figure 2, the whole dataset is grouped into a unique cluster. In this case, the K centroids serve for vector quantization, i.e. approximate the shape of the entire cloud such that the mean squared error (MSE) is minimized. The MSE is (3) where c() is the centroid of the Voronoi cell (local neighborhood or local cluster) to which I belongs. After a random initialization of the K centroids, the K-means algo- rithm proceeds in two steps: Update the Voronoi Cells. c(xn) = Ck(n) n = 1... N. (4) See (20.3) in Chap. 20 of Sir David MacKay's book. Now, for each point n, we know that it belongs to the cell (n) with centroid c(xn) = C(n). (xn) is the closest centroid to In as given by (4). Let R(k) be the number of points in cell k, we have k_ R(k) = N. k=1 = MSE = Update the Centroids. argminen - Ck || closest centroid to In, - k=1 ||n C(xn)|| N : c(en)=ck R(K) Xn sum of all points in cell k number of points in cell k' Ck = (5) See (20.5) in Chap. 20 of Sir David MacKay's book. The algorithm iterates over the two steps until the MSE stops decreasing or until reaching a maximum number of iterations. In some cases, as shown in Figure 3, the dataset is already organized in clusters. The K-means algorithm is used to find the best positions of the K centroids. Then, given a new point, you can use its distance to the centroids to infer the cluster (i.e. the Voronoi cell) this new point belongs to. In this case, the centroids serve for classification. k = 1... K. (a) The MSE versus the iteration number is shown in Figure 4 for the dataset of N = 2000 blue points of Figure 2. How many iterations are needed for the K-means algorithm to reach convergence? Hint: Just follow the plot of Figure 4. (b) The initialization of the K centroids can be done in two different methods: 1) Select K random points in the space R, 2) Select K random points among the N points of the dataset. Which method is better? Explain. (c) Assume that the dataset is organized in three clusters, before running the K-means algorithm, as in Figure 3. Running the K-means with K = 3 centroids should lead to one centroid per cluster as shown in Figure 3. What happens if we run the K-means algorithm with K = 4 centroids or more?
Expert Answer:
Answer rating: 100% (QA)
a The number of iterations needed for the Kmeans algorithm to reach convergence cannot be determined ... View the full answer
Related Book For
Introduction to Algorithms
ISBN: 978-0262033848
3rd edition
Authors: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
Posted Date:
Students also viewed these programming questions
-
When working on Assignment A2-1 Job Analysis, you will act as an employer. First, use this template to specify the organization, job title, and key responsibilities. 2.Then, review a list of Sample...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
"All Boeing airplanes are certified and delivered to the highest levels of safety consistent with industry standards. Airplanes are delivered with baseline configuration, which includes a standard...
-
How do recruitment and selection practices contribute to high performance in an organization?
-
If an electric wire is allowed to produce a magnetic field no larger than that of the Earth (0.55 X 10-4 T) at a distance of 25 cm, what is the maximum current the wire can carry?
-
Problem refer to the following graph of y = f(x). Identify the points or intervals on the x axis that produce the indicated behavior. Inflection points f(x) C1 C2 C3 CA C5 C6 C7 b -X
-
A senior VP has proposed that "we pay too much tax, so we should issue new debt and use the proceeds to buy back common stock." What would be the logic behind this statement?
-
Two mutually exclusive alternatives are being considered. One of the alternatives must be selected. Using a 15% nominal interest rate compounded continuously, determine which one. Solve by annual...
-
bank would you lend the company more money? If you were an investor would you consider investing more money in the company? Briefly explain your decisions. c. During January of 2024, the company had...
-
Suppose that the fixed effects model is formulated with an overall constant term and n 1 dummy variables (dropping, say, the last one). Investigate the effect that this supposition has on the set of...
-
Teo Corporation can prepay a bond with: 4 years to maturity 50000000 in remaining principal i = 10% yearly rate with monthly payments It can replace this bond with a 4 year bond with the same...
-
What are some indicators that a digital device has been infected?
-
Defendants Jack and Claire Lein owned and lived on Willow Creek Farm from 1980 through 2004. The farm manager, Stewart, and his girlfriend, plaintiff Tambra Curtis, also lived on the farm during this...
-
Parents of minors took Apple to court in 2012 for supplying game applications, on iPhones, that were free but through which users could purchase in-game currencies. Apparently, parents would log on...
-
Rockstar Games, a subsidiary of Take-Two Interactive, released the video game Grand Theft Auto V in 2013. The game features a character named Lacey Jonas, a self-proclaimed actress slash singer and...
-
On June 28, 1997, in Las Vegas, heavyweight boxers Mike Tyson and Evander Holyfield met for what proved to be a night to remember. During the third round of the fight, a desperate Tyson illegally bit...
-
13. A parameter is a measure that is computed from a. Population data b. Sample data C. Test statistics d. None of these 2 14. Which of the following is not a valid probability?" a. 04 b. 8/94 C....
-
What is the maximum volume of 0.25 M sodium hypochlorite solution (NaOCl, laundry bleach) that can be prepared by dilution of 1.00 L of 0.80 M NaOCl?
-
Suppose that an algorithm uses only comparisons to find the i th smallest element in a set of n elements. Show that it can also find the i - 1 smaller elements and the n - i larger elements without...
-
This problem investigates D. Willard's "y-fast tries" which, like van Emde Boas trees, perform each of the operations MEMBER, MINIMUM, MAXIMUM, PREDECESSOR, and SUCCESSOR on elements drawn from a...
-
Consider the regularity condition af (n/b) cf (n) for some constant c < 1, which is part of case 3 of the master theorem. Give an example of constants a 1 and b > 1 and a function f (n) that...
-
Using an economic balance sheet, which of the Laws current financial assets is most concerning from an asset allocation perspective? A. Equities B. Real estate C. Fixed income Raye uses a costbenefit...
-
Using the economic balance sheet approach, the Laws economic net worth is closest to: A. $925,000. B. $1,425,000. C. $1,675,000. Raye uses a costbenefit approach to rebalancing and recommends that...
-
Raye believes the previous advisers asset class specifications for equity and derivatives are inappropriate given that, for purposes of asset allocation, asset classes should be: A. diversifying. B....
Study smarter with the SolutionInn App