Question: Given m data points xi in Rn , i = 1 , . . . , m , K - means clustering algorithm groups them
Given m data points xi in Rn i m Kmeans clustering algorithm groups them into k clusters by
minimizing the distortion function over rijj
m
J
i
k
j
rij xi j
where rij if xi belongs to the jth cluster and rij otherwise.
points Derive mathematically that using the squared Euclidean distance xi j as the dis
similarity function, the centroid that minimizes the distortion function J for given assignments rij are
given by
j irijxi
i rij
That isj is the center of jth cluster.
Hint: You may start by taking the partial derivative of J with respect to j with rij fixed.
points Derive mathematically what should be the assignment variables rij be to minimize the
distortion function J when the centroids j are fixed.
points For the question above, now suppose we change the similar score to a quadratic distance
also known as Mahalanobis distance for given and fixed positive definite matrix Sigma in Rntimes n and the
distortion function becomes:
m
J
i
k
j
rijxi jTSigma xi j
Derive what j and rij becomes in this case
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
