Question: Part 3 : Clustering ( 5 0 pts total ) For Part 3 , you will apply k - means clustering to two different simulated
Part : Clustering pts total
For Part you will apply means clustering to two different simulated datasets and compare
its performance. Both datasets are generated with observations, but each were
generated with a different number of "clusters." These datasets, pscsv and pscsv
will be labeled as "one" and "two," respectively.
Vanilla means pts total
Visualizing the data pts
Make a scatterplot of each dataset. Visually, guess at how many groups were used to simulate
the data and include the name of the dataset and your guess in the title of each.
# TODO: Two plots Implementing means with Manhattan distance pts total
Computing Manhattan distance pts
Implement the Manhattan distance metric in a function that you can use in your means
implementation. Write three test cases to be sure your function works correctly.
# TODO: Manhattan distance
Running kMeans again pts
Using your implementation from implement the means algorithm with Manhattan
distance distance. Recall that you will be assigning points to clusters based on minimum
Manhattan distance. Be sure to produce the withincluster sum of squared Manhattan
distance WCSS For each dataset, run the algorithm times using a for loop for
your optimal values of from Plot the best solution for that value of with points
colored by cluster assignment. Include the dataset, and the final WCSS in the title of each
plot.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
