Question: 3 . In this problem, you will generate simulated data, and then perform K - means clus - tering on the data. ( a )

3. In this problem, you will generate simulated data, and then perform K-means clus- tering on the data.
(a) Generate a simulated data set with 20 observations in each of K=5 well-separated clusters, with p =2 variables describing each observation. Do it in similar fash- ion to K =3 case in K-means: Selecting K lecture slides. Plot the resulting data points.
(b) Run K-means algorithm on your simulated data for K =4,5 & 6. Use 50 random starts in each case. Provide the code and report the total within-cluster sum of squares (WSS) for each solution. Which transition caused the bigger drop in total WSS, from K =4 to K =5, or from K =5 to K =6?
(c) Proceed to plot the clustering solutions for K =4,5 & 6(just use the plots automatically generated by eclust() function). Judging by the plots, explain why the respective sizes of WSS drops in part (b) were expected. Use lecture slides (the K =3 simulation study) for reference.
(d) Run K-means clustering on your simulated data for K =1,2,...,10, record total WSS for each K value. Plot the progression of WSS values. According to elbow method logic, which K value appears as the optimal one?
(e) For a more formal approach (alternative to elbow method from part (d)), run the gap statistic calculations for K =1,...,10, with 50 replicates each. Which K value is optimal? Does it match with the # of well-separated clusters in our simulated data? Provide the plot of gap statistic values.
please give answer to this question in r studio

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!