Question: Problem 1 ( 2 0 Points ) : Lloyd's Method Given a dataset with seven data points { x 1 , cdots, x 7 }
Problem Points: Lloyd's Method
Given a dataset with seven data points cdots, and the distances between all pairs of data points are in
the following table.
Assume the number of clusters and the cluster centers are initialized to be and
Points. What's the two clusters formed at the end of the first iteration of Lloyd's algorithm?
Points. What's the two clusters formed at the end of the second iteration of Lloyd's algorithm?
Points. What's the two clusters formed when the Lloyd's algorithm converges?
Problem Points: Guassian Mixture Model GMM: Latent Variable View
Consider a GMM in which the marginal distribution for the latent variable is given by
where ;cdots, and satisfies and
Moreover, the conditional distribution for the observed variable is given by:
Prove that obtaining by summing over all possible values of is a GMM That is
Problem Points: Generating GMMs
In this problem, you will write code to generate a mixture of Gaussians satisfying the following requirements,
respectively. Please specify the mean vector and covariance matrix of each Gaussian in your answer
Points. Draw a data set where a mixture of spherical Gaussians where the covariance matrix is
the identity matrix times some positive scalar can model the data well, but Kmeans cannot.
Points. Draw a data set where a mixture of diagonal Gaussians where the covariance matrix can
have nonzero values on the diagonal, and zeros elsewhere can model the data well, but Kmeans and a
mixture of spherical Gaussians cannot.
Points. Draw a data set where a mixture of Gaussians with unrestricted covariance matrices can
model the data well, but Kmeans and a mixture of diagonal Gaussians cannot.
Problem Points: Implementing KMeans and Spectral Clustering
Given a number of toy datasets in toydata. zip Each dataset contains a number of clusters as shown in
Figure and your task is to find these clusters using your implemented clustering algorithms.
Points. Implementing Lloyd's Kmeans: Your submitted function should be function label
my kmeans data K where label returns the dimensional clustering result, where is the total
number of data points. data is with size and is the number of known clusters. To initialize,
randomly select samples to initialize your cluster centroids. Iterate your algorithm until convergence.
Use Euclidean distance as the distance measure. Name your file my
ckeeans.py
Points. Implementing Spectral Clustering: Your submitted function should be function
label myspectralclustingdata K sigma where label, data and are the same as above
and sigma is the bandwith for Gaussian kernel used in spectral clustering. You will see sigma
important for your clustering performance. Adjust it casebycase for every toy dataset to output the
best results. Name your file
myspectralclusting.py
Points. Compare your spectral clustering results with kmeans. It is natural that on certain hard
toy example, both method won't generate perfect results. In your report, briefly analyze what is the
advantage or disadvantage of spectral clustering over means. Why it is the case? You do not need to
mathematically prove it but just need to give answers in your own language.
Remarks: Write a file named Runclustering.py at top level to give the clustering results. In
your code should:
Load all the mat file data and generate clustering labels with your kmeans and spectral clust
In the mat files, D is the data matrix and L is the ground truth label matrix. To run you
algorithms, you cannot use L as they only serve as a reference.
algorithms, you cannot use L as they only serve as a reference.
Show your clustering results in the solution if you use word or latex and indicate which
kmeans or spectral
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
