Question: 2. [10 marks] The k-means algorithm is widely used in cluster analysis for its simplicity. Since sample means can be severely affected by outliers,

2. [10 marks] The k-means algorithm is widely used in cluster analysis

 

2. [10 marks] The k-means algorithm is widely used in cluster analysis for its simplicity. Since sample means can be severely affected by outliers, one may want to replace sample means with sample medians as cluster centers and thus obtain the following k-means algorithm based on medians for univariate data (perhaps we can call it the "k-medians algorithm"): Given k initial cluster centers c,...,Ck for a sample 21,...,n, repeat the following two steps until cluster centers do not change: (1) Calculate dij = -C| for i=1,...,n and j = 1,..., k. Classify x; into cluster m, if dim is the smallest of dil...., dik. (2) For j = 1,...,k, set c; = median(x) for all x; in cluster j. Implement the "k-medians algorithm" in an R. function which, given a univariate sample and an initial set of cluster centers (both in vectors), computes and returns the final cluster centers. Apply it to the variable eruptions (for eruption times of the well-known Old Faithful Geyser) in the data set faithful in R, using the following initial cluster centers, respectively: (a) (2, 4); (b) (2, 3, 4); (c) (2,3,4,5)

Step by Step Solution

3.42 Rating (146 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a Final centers are 19830 and 43415 See the plot below showing ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!