Question: CSCI 4 6 0 Clustering Project Please implement the Expectation Maximization based k - means clustering algorithm we discussed in class and apply it to

CSCI

460

Clustering Project

Please implement the Expectation Maximization based k

-

means clustering algorithm we discussed in class and apply it to the data set provided

(

3 .

data

)

for k in

{2, 3, 4} .

Consider various values for

\

sigma

,

including

\

sigma in

{0.5, 1.0, 2.0},

and run the clustering multiple times for each

\

sigma value under different initial conditions. Show these results. Evaluate the quality of the clustering for each value of

\

sigma using an average of the Davies

-

Bouldin Index for each clustering.

In addition to providing the code and results just described, please answer the following questions:

How sensitive is the method to

\

sigma

?

Which value for

\

sigma appeared work the best?

How sensitive is the method to k

?

Which value for k appeared to work the best?

Please visualize the data in some way and use your intuition to try to explain your results.

Please submit your answers in the BlackBoard assignment submission field text box, but I will grade your source code by pulling from your GitHub repository. So make sure it is pushed by the due date.

Please make sure your code is documented sufficiently so that it is easy to know how read and execute your program. Do not collaborate with other students, and do not use code off of the Internet

(

other than what I give you

) .

Reading And Dealing with the Data

You may read in the data in whatever way you like; however, I suggest using the Pandas package. I also suggest using Numpy to deal with vectors. Some code examples below may be useful to you:

import pandas as pd

import numpy as np

# Load the data:

3 =

.

read

_

csv

('

3 .

data'

)

# Get all values in column

1

3 ['

']

# Get values associated with row

1

3 .

iloc

[0]

# Convert the whole dataset to a Numpy matrix:

.

array

(

3)

# Use Numpy to compute the L

2

norm distance between two points:

=

.

array

(

3 .

iloc

[1])

=

.

array

(

3 .

iloc

[0])

.

linalg.norm

(

-

)

# Use Numpy to compute stats over data

,

=

.

shape

(

3)

# Get the size and dimensionality of the dataset

.

sum

(

3 ['

'])

# Sum of the X column

.

mean

(

3 ['

'])

# Average of the X column

.

std

(

3 ['

'])

# Standard deviation of the X column

# Numpy's random module may be helpful

.

random.choice

(

range

(

), 3,

replace

=

False

)

# Choose three

1

:m w

/

o replacement

.

random.normal

(

loc

= 2.1,

scale

= 0.2,

size

= 4)

# Draw four numbers from N

(2.1, 0.2)

# Numpy has an exp function. So you can compute e

^(2)

as follows:

.

exp

(2)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

1 Assignment 2 Latent Variables and Neural Networks Due Date: 21:59:59 23 May 2021 Please note that, 1. 1 sec delay will be penalized as 1 day delay. So please submit your assignment in advance...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

\f \f11TH EDITION STRATEGIC MANAGEMENT THEORY 11TH EDITION Strategic Management THEORY Charles W. L. Hill University of Washington - Foster School of Business Gareth R. Jones Melissa A. Schilling New...

Poly1.h : // FILE: poly1.h // CLASS PROVIDED: // class polynomial // A polynomial has one variable x, real number coefficients, and // non-negative integer exponents. Such a polynomial can be viewed...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

please help me with this as my tutor fell through and I dont understand all sections these are the programs it gave me this is the main java above this is the grade book reader above these are the...

Hi I need help with this project that I am doing. It has to be in C language and I don't what to do. This is for my Data Structure course. Please it has to be in Language of C. Programming Assignment...

Analyze the Design and Methodology in Two Quantitative Studies Recall the two quantitative studies you read for this lesson from the eReserves; use these to answer the following questions: Identify...

uetion1 OECD Health Statistics 2015 Definitions, Sources and Methods HEALTH STATUS (HEALTH_STAT) Access the dataset on Health Status in OECD.Stat:...

London School of Science & Technology Qualification Unit number and title BTEC Level 5 HND Diploma Business UNIT 6: Business Decision Making Student name and ID number Assessor name Al Hassan Barrie...

c = 0.63 ; = 0.17 , = 0.21 @ex = 0.08 m = 0.09 R=5 1 7 = 2 F = 0.03 Y = $15,000 3 4. Now suppose that the real interest ra

(a) Using the heat of vaporization in Appendix B, calculate the entropy change for the vaporization of water at 25C and at 100C. (b) From your knowledge of microstates and the structure of liquid...

which of the following statements is false? A . prior to its maturity, the price of a zero - coupon bond is always greater than its face value B . the amount of each coupon payment is determined by...

Can you elucidate the role of molecular chaperones and protein quality control systems in maintaining the integrity and homeostasis of the cytoplasm, particularly under conditions of cellular stress...