Question: Use the kMeans algorithm to automatically identify clusters of similar elements for the datasets normal.txt and unbalance.txt . In the solution, implement Random

Use the kMeans algorithm to automatically identify clusters of similar elements for the datasets "normal.txt" and "unbalance.txt".
In the solution, implement Random Restart with an evaluation of the quality of the obtained clusters. As evaluation metrics, use:
1. Within-Cluster Sum of Squares (WCSS): Type: Intra-cluster Explanation: Measures the compactness of clusters by summing squared distances of points from their respective cluster centroids. It focuses on intra-cluster cohesion.
2. Silhouette Score: Type: Both (intra-cluster and inter-cluster) Explanation: Measures both the compactness of points within a cluster and the separation between clusters. High silhouette values indicate well-separated, compact clusters.
Compare the results.
Additionally, implement kMeans++(without random restart) and compare the results.
Input:
File name, algorithm, metric, and number of clusters.
Output:
A plot showing the identified clusters in different colors. (All examples in the datasets are described by two attributes: x and y, representing the position of the point in Euclidean space.)
You can use the provided Python script "plot_clusters.py" to generate the plot, which takes:
A file with the data points, A file with the centroids, A file with the cluster labels corresponding to each data point.
Example Input:
unbalance.txt kmeans 18
Solve the problem in C++.
Here are a few rows from normal.txt file:
5.2754.893
5.3394.476
4.8874.234
5.8954.843
...
Here are a few rows from unbalance.txt file:
151700351102
155799354358
142857352716
152726349144
151008349692
...
Here is the Python script that you have to connect the C++ code to:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
def plot_data_and_centroids(data_file, centroids_file, labels_file):
X = np.loadtxt(data_file)
centroids = np.loadtxt(centroids_file)
labels = np.loadtxt(labels_file, dtype=int)
plt.figure(figsize=(8,6))
sns.scatterplot(x=X[:,0], y=X[:,1], hue=labels, palette='Set1', s=100, legend='full')
plt.scatter(centroids[:,0], centroids[:,1], c='black', s=300, marker='X', label='Centroids')
plt.title('Data and Centroids Visualization')
plt.legend()
plt.show()
if __name__=="__main__":
if len(sys.argv)!=4:
print("Usage: python plot_clusters.py ")
sys.exit(1)
data_file = sys.argv[1]
centroids_file = sys.argv[2]
labels_file = sys.argv[3]
plot_data_and_centroids(data_file, centroids_file, labels_file)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!