Question: Part 3 : Clustering ( 5 0 pts total ) For Part 3 , you will apply k - means clustering to two different simulated

Part 3: Clustering (50pts total)
For Part 3, you will apply k-means clustering to two different simulated datasets and compare
its performance. Both datasets are generated with N=500 observations, but each were
generated with a different number of "clusters." These datasets, ps2-1.csv and ps2-2.csv
will be labeled as "one" and "two," respectively.
3.1 Vanilla k-means (25 pts total)
3.1.1 Visualizing the data (5pts)
Make a scatterplot of each dataset. Visually, guess at how many groups were used to simulate
the data and include the name of the dataset and your guess in the title of each.
# TODO: Two plots 3.2 Implementing k-means with Manhattan distance (25 pts total)
3.2.1 Computing Manhattan distance (5pts)
Implement the Manhattan distance metric in a function that you can use in your k-means
implementation. Write three test cases to be sure your function works correctly.
# TODO: Manhattan distance
3.2.2 Running k-Means again (5 pts)
Using your implementation from 3.1.2, implement the k-means algorithm with Manhattan
distance distance. Recall that you will be assigning points to clusters based on minimum
Manhattan distance. Be sure to produce the within-cluster sum of squared Manhattan
distance (WCSS). For each dataset, run the algorithm M=10 times using a for() loop for
your optimal values of k from 3.1.3. Plot the best solution for that value of k, with points
colored by cluster assignment. Include the dataset, k, and the final WCSS in the title of each
plot.
Part 3 : Clustering ( 5 0 pts total ) For Part 3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!