Question: 3 . In this problem, you will generate simulated data, and then perform K - means clus - tering on the data. ( a )
In this problem, you will generate simulated data, and then perform Kmeans clus tering on the data.
a Generate a simulated data set with observations in each of K wellseparated clusters, with p variables describing each observation. Do it in similar fash ion to K case in Kmeans: Selecting K lecture slides. Plot the resulting data points.
b Run Kmeans algorithm on your simulated data for K & Use random starts in each case. Provide the code and report the total withincluster sum of squares WSS for each solution. Which transition caused the bigger drop in total WSS from K to K or from K to K
c Proceed to plot the clustering solutions for K & just use the plots automatically generated by eclust function Judging by the plots, explain why the respective sizes of WSS drops in part b were expected. Use lecture slides the K simulation study for reference.
d Run Kmeans clustering on your simulated data for K record total WSS for each K value. Plot the progression of WSS values. According to elbow method logic, which K value appears as the optimal one?
e For a more formal approach alternative to elbow method from part d run the gap statistic calculations for K with replicates each. Which K value is optimal? Does it match with the # of wellseparated clusters in our simulated data? Provide the plot of gap statistic values.
please give answer to this question in r studio
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
