Question: Use the kMeans algorithm to automatically identify clusters of similar elements for the datasets normal.txt and unbalance.txt . In the solution, implement Random

Use the kMeans algorithm to automatically identify clusters of similar elements for the datasets "normal.txt

"

and "unbalance.txt

" .

In the solution, implement Random Restart with an evaluation of the quality of the obtained clusters. As evaluation metrics, use:

1 .

Within

-

Cluster Sum of Squares

(

WCSS

)

: Type: Intra

-

cluster

Explanation: Measures the compactness of clusters by summing squared distances of points from their respective cluster centroids. It focuses on intra

-

cluster cohesion.

2 .

Silhouette Score: Type: Both

(

intra

-

cluster and inter

-

cluster

)

Explanation: Measures both the compactness of points within a cluster and the separation between clusters. High silhouette values indicate well

-

separated, compact clusters.

Compare the results.

Additionally, implement kMeans

+ + (

without random restart

)

and compare the results.

Input:

File name, algorithm, metric, and number of clusters.

Output:

A plot showing the identified clusters in different colors.

(

All examples in the datasets are described by two attributes: x and y

,

representing the position of the point in Euclidean space.

)

You can use the provided Python script "plot

_

clusters.py

"

to generate the plot, which takes:

A file with the data points, A file with the centroids, A file with the cluster labels corresponding to each data point.

Example Input:

unbalance.txt kmeans

1 8

Solve the problem in C

+ + .

Here are a few rows from normal.txt file:

5.275 4.893

5.339 4.476

4.887 4.234

5.895 4.843

. . .

Here are a few rows from unbalance.txt file:

151700 351102

155799 354358

142857 352716

152726 349144

151008 349692

. . .

Here is the Python script that you have to connect the C

+ +

code to:

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import sys

def plot

_

data

_

and

_

centroids

(

data

_

file, centroids

_

file, labels

_

file

)

=

.

loadtxt

(

data

_

file

)

centroids

=

.

loadtxt

(

centroids

_

file

)

labels

=

.

loadtxt

(

labels

_

file, dtype

=

int

)

plt

.

figure

(

figsize

= (8, 6))

sns

.

scatterplot

(

=

[

, 0],

=

[

, 1],

hue

=

labels, palette

=

'Set

1',

= 100,

legend

=

'full'

)

plt

.

scatter

(

centroids

[

, 0],

centroids

[

, 1],

=

'black', s

= 300,

marker

='

',

label

=

'Centroids'

)

plt

.

title

('

Data and Centroids Visualization'

)

plt

.

legend

()

plt

.

show

()

__

name

__= = "__

main

__"

if len

(

sys

.

argv

)! = 4

("

Usage: python plot

_

clusters.py

")

sys

.

exit

(1)

data

_

file

=

sys

.

argv

[1]

centroids

_

file

=

sys

.

argv

[2]

labels

_

file

=

sys

.

argv

[3]

plot

_

data

_

and

_

centroids

(

data

_

file, centroids

_

file, labels

_

file

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Choose between alternatives E and F, based on NPW. The minimum attractive rate of return is 15%. The following table summarizes the economic characteristics of the alternatives: Alternative Total...

I need help with building a Business case Abstract International trade becomes increasingly frequent with the deepening of economic globalization. In order to ensure the stable and rapid development...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

Target variable Number of O- rings with Stress 1 0 1 0 0 2 3 4 5 6 7 0 ||| 0 0 8 0 Leak-Check Pressure 50 50 50 50 50 50 100 100 200 200 200 200 200 200 200 200 200 200 9 1 6 1 Attribute variables...

Question 1: Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order) 13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,...

Problem: Manual implementation of K - means ( 1 3 pts . ) ( a ) Training data set for K - means clustering w / o constraints ( b ) Training data set for K - means clustering w / constraints Figure 1...

1 . 2 Problem: Manual implementation of K - means ( 1 3 pts . ) > ^ x ( a ) Training data set for K - means clustering w / o ( b ) Training data set for K - means clustering w / conconstraints...

What are all possible ways to mitigate the risk of having a bad initial random initialization in the k - means algorithm? ( select all that apply ) Select 3 correct answer ( s ) Question 1 2 options:...

Given a list of N n - vectors x 1 , , xN , and an initial list of k group representative n - vectors z 1 , , zk repeat until convergence 1 . Partition the vectors into k groups. For each vector i = 1...

Please explain LDA numerical and K means algorithm numerical .Please tell question 9 part B) and Question 10) part A) new centres? new clusters? and how many more iterations are needed to converge? i...

For a double riveted double cover butt joint in plates 20mm thick is made with 25 mm diameter rivets at 100 mm pitch and the permissible stresses are: Tensile stress = 120 MPa shear - stress = 100...

How does the concept of substantial performance apply to accounting for franchise sales?

Which of the following people will probably need the highest wage replacement rate? Devan, who is age 6 6 , earns \ ( \ $ 5 0 , 0 0 0 \ ) per year and will retire in 1 year with 5 years left on his...

42 . Refer to the diagram above. In this inslance. point e shown on the graph indicates A. the point where prots will increase by increasing output B. the point where prots will increase by reducing...