Question: CSCI 4 6 0 Clustering Project Please implement the Expectation Maximization based k - means clustering algorithm we discussed in class and apply it to
CSCI Clustering Project
Please implement the Expectation Maximization based kmeans clustering algorithm we discussed in class and apply it to the data set provided prdata for k in Consider various values for sigma including sigma in and run the clustering multiple times for each sigma value under different initial conditions. Show these results. Evaluate the quality of the clustering for each value of sigma using an average of the DaviesBouldin Index for each clustering.
In addition to providing the code and results just described, please answer the following questions:
How sensitive is the method to sigma
Which value for sigma appeared work the best?
How sensitive is the method to k
Which value for k appeared to work the best?
Please visualize the data in some way and use your intuition to try to explain your results.
Please submit your answers in the BlackBoard assignment submission field text box, but I will grade your source code by pulling from your GitHub repository. So make sure it is pushed by the due date.
Please make sure your code is documented sufficiently so that it is easy to know how read and execute your program. Do not collaborate with other students, and do not use code off of the Internet other than what I give you
Reading And Dealing with the Data
You may read in the data in whatever way you like; however, I suggest using the Pandas package. I also suggest using Numpy to deal with vectors. Some code examples below may be useful to you:
import pandas as pd
import numpy as np
# Load the data:
pr pdreadcsvprdata'
# Get all values in column :
prX
# Get values associated with row :
priloc
# Convert the whole dataset to a Numpy matrix:
nparraypr
# Use Numpy to compute the L norm distance between two points:
x nparraypriloc
z nparraypriloc
nplinalg.normxz
# Use Numpy to compute stats over data
m d npshapepr # Get the size and dimensionality of the dataset
npsumprX # Sum of the X column
npmeanprX # Average of the X column
npstdprX # Standard deviation of the X column
# Numpy's random module may be helpful
nprandom.choicerangem replaceFalse # Choose three :m wo replacement
nprandom.normalloc scale size # Draw four numbers from N
# Numpy has an exp function. So you can compute e as follows:
npexp
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
