Question: Objective: Implement external validation methods to compare automatically generated partitions with external ones. In many applications, we have multiple partitions generated either by different clustering

Objective: Implement external validation methods to compare automatically generated partitions

with external ones.

In many applications, we have multiple partitions generated either by different clustering

algorithms

(

.

.,

-

means and another partitional clustering algorithm

)

or by the same clustering

algorithm, but using different parameters

(

.

.,

different initial centers

)

and we would like to

know which of these partitions fits the data better. If we know the

true

cluster label of each

point, we can use an external validity method to quantify the degree of similarity between an

automatically generated partition

(

.

.,

a partition produced by k

-

means

)

and the

(

external

)

partition implied by the true cluster labels.

In this phase, you will implement external validation methods to compare automatically

generated partitions with external partitions. External validation is often accomplished using an

external validity index, a function that takes two partitions and possibly some additional

parameters as input and gives a numerical value indicating the degree of match between the two partitions as output.

In this phase, you will implement the Rand and Jaccard external validity indices. These indices

are described in many resources, see for example, the following book by Zaki and Meira Jr

. .

In this phase, you will be supplied with a similar collection of data sets, but the file format will be slightly different: the very first line will contain three

integers

(

# points, # attributes

+ 1,

# true clusters

)

and, at the end of each subsequent line, the

true cluster label of the point, between

0

and

(

# true clusters

1),

inclusive, will be given. For

example, the first four lines of iris

_

bezdek will be

150 5 3

5.1 3.5 1.4 0.2 0

4.9 3.0 1.4 0.2 0

4.7 3.2 1.3 0.2 0

meaning that the data set contains

150

points, each point is

5 1 = 4

dimensional, and there are

3

true clusters

(

that is

,

each point belongs to cluster

0, 1,

2) .

In the example given above, all

three points belong to cluster

0 .

For the aforementioned external indices, higher values are better. The way these indices are used is quite simple. For a given data set, first run k

-

means R times, each time with the same K

value

(

equal to the # true clusters specified in the data set file

),

but with a different set of

randomly selected centers

(

as in phase

1) .

Each run will give us a partition

(

of the data set

)

and

we will compute the external validity index

(

say

,

Rand index

)

between that partition and the true

partition

(

that is implied by the true cluster labels

) .

Finally, the run that gives the highest Rand

index value is declared as the best run and the partition produced by this run is taken as the best

partition of the data set.

Output: Tabulate the best values of each external validity index for each data set

(

for each data

set, k

-

means should be run R

= 100

times, each time with a different set of randomly selected

centers

) .

All data sets must be normalized using the

min

-

max

method.

Language: Java. You may only use the built

-

in facilities of these languages. In

other words, you may not use third

-

party libraries.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Could you please explain the findings of the study? A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models Evangelia...

Due to the changing environment and external triggers, contingency planning is necessary. What qualities make a future issue a ?trigger?? Consider you are on the strategic planning team for a soft...

Questions: #6A:Realism of recommendations for Target which of the author's recommendations are realistic for a company like Target? #6B: Comparison with Trinity In a general sense, how does the...

Evaluation and Control in Strategic Management Evaluation and control information consists of performance data and activity reports (gathered in Step 3 in Figure 11-1). If undesired performance...

Post-mortem of the first phase: How should a vision be communicated? This case presents two phases of a large business transformation project involving the implementation of an ERP system with the...

Post-mortem of the first phase: How was the project team managed and is the project teams pace sustainable in the longer run? BOMBARDIER CASE STUDY IS IN MY PREVIOUS QUESTION This case presents two...

Post-mortem of the first phase: 4. How were the new roles defined, communicated, understood? This case presents two phases of a large business transformation project involving the implementation of...

How can we assess whether a project is a success or a failure? This case presents two phases of a large business transformation project involving the implementation of an ERP system with the aim of...

Post-mortem of the second phase: What were the strong points in the second implementation process what areas needed improvements? This case presents two phases of a large business transformation...

200 ml of an ideal gas is measured to have a mass of 19.57 grams at a temperature of 25 C and 1 atm pressure. What gas is this?

Classify the following accounts into one of the following categories: ( LG 12-1, LG 12-2, LG 12-3, LG 12-4 ) a. Assets b. Liabilities c. Equity d. Revenue e. Expense f. Off-balance-sheet activities

What are the major instruments traded in capital mar - kets? ( LG 1 - 2 . ) Which of the capital market instruments is the largest in terms of dollar amount outstanding in 2 0 1 9 ? ( LG 1 - 2 . ) If...

Consideration and strategic planning should be thought of prior to asking questions during cross examinations true or false