Question: We want to cluster categorical data, i . e . data that have categorical attribute domains. The k - medoid algorithm can be applied to

We want to cluster categorical data, i.e. data that have categorical attribute domains. The k-medoid
algorithm can be applied to any datasets with a given pair-wise distance function and, therefore, is
applicable also to categorical data. The k-means algorithm, on the other hand, is much more efficient
than the k-medoid algorithm, but it requires numeric data. The task of this assignment is to develop
an analogion to the k-means algorithm for categorical data. We assume the following distance
function for pairs of categorical objects:
dist(x,y)=i=1d(xi,yi) with (xi,yi)={0ifxi=yi1else
a) What is the analogion m for the means of a cluster C for categorical data? Note that m must be
computable by scanning the set of objects of C once (similar to the computation of the cluster
means).(10 marks)
[Hint1: If a concept A is in analogy another concept B, then A is said to be an analogion for B]
[Hint2: an analogion m for the means of a cluster C refers to a new definition of the means of a
cluster of categorical data, where the original definition of cluster centre used by k-means does
not work anymore.]
 We want to cluster categorical data, i.e. data that have categorical

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!