Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data

Question:

Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data sets. One possible technique for reducing the time required is to sample the data set. For example, if K clusters are desired and √ m points are sampled from the m points, then a hierarchical clustering algorithm will produce a hierarchical clustering in roughly O(m) time. K clusters can be extracted from this hierarchical clustering by taking the clusters on the Kth level of the dendrogram. The remaining points can then be assigned to a cluster in linear time, by using various strategies. To give a specific example, the centroids of the K clusters can be computed, and then each of the m − √ m remaining points can be assigned to the cluster associated with the closest centroid.
(a) Data with very different sized clusters.
(b) High-dimensional data.
(c) Data with outliers, i.e., atypical points.
(d) Data with highly irregular regions.
(e) Data with globular clusters.
(f) Data with widely different densities.
(g) Data with a small percentage of noise points.
(h) Non-Euclidean data.
(i) Euclidean data.
(j) Data with many and mixed attribute types.
Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Introduction to Data Mining

ISBN: 978-0321321367

1st edition

Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar

Question Posted: