Question: Given m data points xi in Rn , i = 1 , . . . , m , K - means clustering algorithm groups them

Given m data points xi in Rn, i =1,...,m, K-means clustering algorithm groups them into k clusters by
minimizing the distortion function over {rij,j}
m
J =
i=1
k
j=1
rij\| xi j\|2,
where rij =1 if xi belongs to the j-th cluster and rij =0 otherwise.
1.(10 points) Derive mathematically that using the squared Euclidean distance \| xi j\|2 as the dis
similarity function, the centroid that minimizes the distortion function J for given assignments rij are
given by
j = irijxi
i rij
.
That is,j is the center of j-th cluster.
Hint: You may start by taking the partial derivative of J with respect to j, with rij fixed.
2.(10 points) Derive mathematically what should be the assignment variables rij be to minimize the
distortion function J, when the centroids j are fixed.
3.(5 points) For the question above, now suppose we change the similar score to a quadratic distance
(also known as Mahalanobis distance) for given and fixed positive definite matrix \Sigma in Rn\times n, and the
distortion function becomes:
m
J =
i=1
k
j=1
rij(xi j)T\Sigma (xi j),
Derive what j and rij becomes in this case

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!