Question: Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data sets. One possible technique for reducing the time

Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data sets. One possible technique for reducing the time required is to sample the data set. For example, if K clusters are desired and √ m points are sampled from the m points, then a hierarchical clustering algorithm will produce a hierarchical clustering in roughly O(m) time. K clusters can be extracted from this hierarchical clustering by taking the clusters on the Kth level of the dendrogram. The remaining points can then be assigned to a cluster in linear time, by using various strategies. To give a specific example, the centroids of the K clusters can be computed, and then each of the m − √ m remaining points can be assigned to the cluster associated with the closest centroid.
(a) Data with very different sized clusters.
(b) High-dimensional data.
(c) Data with outliers, i.e., atypical points.
(d) Data with highly irregular regions.
(e) Data with globular clusters.
(f) Data with widely different densities.
(g) Data with a small percentage of noise points.
(h) Non-Euclidean data.
(i) Euclidean data.
(j) Data with many and mixed attribute types.

Step by Step Solution

3.51 Rating (175 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

For each of the following types of data or clusters discuss briefly if 1 sampling will cause problems for this approach and 2 what those problems are ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Document Format (1 attachment)

Word file Icon

908-M-S-D-A (8688).docx

120 KBs Word File

Students Have Also Explored These Related Statistics Questions!