Question: Problem 1 ( 2 0 Points ) : Lloyd's Method Given a dataset with seven data points { x 1 , cdots, x 7 }

Problem

1 (20

Points

)

: Lloyd's Method

Given a dataset with seven data points

{x_{1},

cdots,

x_{7}}

and the distances between all pairs of data points are in

the following table.

Assume the number of clusters

k = 2,

and the cluster centers are initialized to be

x_{3}

and

x_{6} .

5

Points. What's the two clusters formed at the end of the first iteration of Lloyd's algorithm?

5

Points. What's the two clusters formed at the end of the second iteration of Lloyd's algorithm?

10

Points. What's the two clusters formed when the Lloyd's algorithm converges?

Problem

2 (15

Points

)

: Guassian Mixture Model

(

GMM

)

: Latent Variable View

Consider a GMM in which the marginal distribution

p (z)

for the latent variable

z

is given by

p (z) = p r o d_{k} = 1^{K}_{k}^{z_{k}}

where

_{k = 1}^{K}_{k} = 1

;

z = [z_{1}, z_{2},

cdots,

z_{K}]

and

z_{k}

satisfies

z_{k} i n {0, 1}

and

_{k = 1}^{K} z_{k} = 1 .

Moreover, the conditional distribution

(z |)

for the observed variable

x

is given by:

(z |) (_{k},_{k} |)

Prove that

p (x),

obtaining by summing

(z |)

over all possible values of

z,

is a GMM

.

That is

,

(z |) (_{k},_{k} |)

Problem

4 (20

Points

)

: Generating GMMs

In this problem, you will write code to generate a mixture of

3

Gaussians satisfying the following requirements,

respectively. Please specify the mean vector and covariance matrix of each Gaussian in your answer

6

Points. Draw a data set where a mixture of

3

spherical Gaussians

(

where the covariance matrix is

the identity matrix times some positive scalar

)

can model the data well, but K

-

means cannot.

6

Points. Draw a data set where a mixture of

3

diagonal Gaussians

(

where the covariance matrix can

have non

-

zero values on the diagonal, and zeros elsewhere

)

can model the data well, but K

-

means and a

mixture of spherical Gaussians cannot.

8

Points. Draw a data set where a mixture of

3

Gaussians with unrestricted covariance matrices can

model the data well, but K

-

means and a mixture of diagonal Gaussians cannot.

Problem

4 (45

Points

)

: Implementing K

-

Means and Spectral Clustering

Given a number of

7

toy datasets

(

in toydata. zip

) .

Each dataset contains a number of clusters as shown in

Figure

1

and your task is to find these clusters using your implemented clustering algorithms.

20

Points. Implementing Lloyd's K

-

means: Your submitted function should be function

[

label

]

=

my kmeans

(

data

,

),

where label returns the

N -

dimensional clustering result, where

N

is the total

number of data points. data is with size

N d

and

K

is the number of

(

known

)

clusters. To initialize,

randomly select

K

samples to initialize your cluster centroids. Iterate your algorithm until convergence.

Use Euclidean distance as the distance measure. Name your file my

ckeeans.py

.

20

Points. Implementing Spectral Clustering: Your submitted function should be function

[

label

] =

_

spectralclusting

(

data

,

,

sigma

),

where label, data and

K

are the same as above

and sigma is the bandwith for Gaussian kernel used in spectral clustering. You will see sigma

important for your clustering performance. Adjust it case

-

-

case for every toy dataset to output the

best results. Name your file

-

spectralclusting.py

.

5

Points. Compare your spectral clustering results with k

-

means. It is natural that on certain hard

toy example, both method won't generate perfect results. In your report, briefly analyze what is the

advantage or disadvantage of spectral clustering over

k -

means. Why it is the case?

(

You do not need to

mathematically prove it but just need to give answers in your own language.

)

Remarks: Write a file named Run

_

clustering.py at top level to give the clustering results. In

your code should:

Load all the mat file data and generate clustering labels with your k

-

means and spectral clust

In the mat files,

"

"

is the data matrix and

"

"

is the ground truth label matrix. To run you

algorithms, you cannot use

"

"

as they only serve as a reference.

algorithms, you cannot use

"

"

as they only serve as a reference.

2 .

Show your clustering results in the solution

(

if you use word or latex

)

and indicate which

(

-

means or spectral

Problem 1 ( 2 0 Points ) : Lloyd's Method Given a

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

firms(1).pdf windTurbineData(3).pdf newsVendorData(1).pdf ISQS-7339: Homework 4 MS-DS students: Solve problems 1-3 (Module 4). If you solve one of the problems 4 or 5 (Module 5), you receive up to 10...

please help me solve the following questions 15. It is known that machines A, B, and (3 work independently, and their working probabilities are l and {1.9 in a day, respectively. Find the probability...

I had posted the same question before but The sample dataset used in the solution is different from the one in question. The x(i,2) data points are 0,-1, 1, 2, 3, 2. The solutions is given for 0, -1,...

Just do Section 7 please. I've also uploaded the rest just for you if you need references. Thank you. (THIS IS IN PYTHON) The rest of the questions just for references if you need are here: Thanks...

Problem Statement - Selecting Projects for Bidding: A contractor has identified a set of eight major construction projects that are to be awarded on the basis of sealed bids. The contractor is...

# -*- coding: utf-8 -*- """ Hw3 test code 2021 Winter PIC 16 A """ a = LinkedList(0) a.append(1) a.append(2) print("7 points if this works") for n in a: print(n) print("") ''' print("2 points if this...

here is Debugging.java : public class Debugging{ public static void setHints(int[][] array){ //Iterate over every element in our 2D Array for(int i = 0; i = anArray.length) { toReturn = false; }...

testcode.py # -*- coding: utf-8 -*- """ a = LinkedList(0) a.append(1) a.append(2) print("7 points if this works") for n in a: print(n) print("") print("2 points if this works") for n in a: print(n)...

CASE Risk Pooling ACME, a company that produces and distributes electronic equipment in the Northeast of the United States, faces a distribution problem. The current distribution system partitions...

Draw the structure for chloric acid, HClO3. Optimize formalcharges.

Describe the orbitals used in bonding and the bond angles in the following compounds. a. BeH2 b. BH3 c. CCl4 d. CO2 e. HCOOH f. N2

Duke and S corporation distributes a partial of land to one of its shareholders the land has a basis of $ 1 0 , 0 0 0 and a fair market value of $ 2 0 , 0 0 0 the distribution will cause Shane to...

You are attending a business lunch with some friends who are also co-workers. Your friends ask you to join them in having an alcoholic drink. Later in the day you will have a meeting with some of your