Question: Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data sets. One possible technique for reducing the time

Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data sets. One possible technique for reducing the time required is to sample the data set. For example, if K clusters are desired and √ m points are sampled from the m points, then a hierarchical clustering algorithm will produce a hierarchical clustering in roughly O(m) time. K clusters can be extracted from this hierarchical clustering by taking the clusters on the Kth level of the dendrogram. The remaining points can then be assigned to a cluster in linear time, by using various strategies. To give a specific example, the centroids of the K clusters can be computed, and then each of the m − √ m remaining points can be assigned to the cluster associated with the closest centroid.
(a) Data with very different sized clusters.
(b) High-dimensional data.
(c) Data with outliers, i.e., atypical points.
(d) Data with highly irregular regions.
(e) Data with globular clusters.
(f) Data with widely different densities.
(g) Data with a small percentage of noise points.
(h) Non-Euclidean data.
(i) Euclidean data.
(j) Data with many and mixed attribute types.

Step by Step Solution

★★★★★

3.51 Rating (175 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

For each of the following types of data or clusters discuss briefly if 1 sampling will cause problems for this approach and 2 what those problems are ... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Document Format (1 attachment)

908-M-S-D-A (8688).docx

120 KBs Word File

Students Have Also Explored These Related Statistics Questions!

Explain how hierarchical clustering and k-means clustering differ with respect to reassigning an item to a different cluster.

Refer to the clustering problem involving the file FBS described in Problem 1. a. Apply hierarchical clustering with 10 clusters using latitude and longitude as variables. Be sure to Normalize input...

From 1946 to 1990, the Big Ten Conference consisted of the University of Illinois, Indiana University, University of Iowa, University of Michigan, Michigan State University, University of Minnesota,...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

can i get a solution to this question please. Value: Document (include your file name) 15% of final grade How to complete: 1. Read the following Anniversary Party scheduling problem: As the project...

i want summary please Survey paper When machine learning meets congestion control: A survey and comparison Huiling Jiang s", Qing Lit ker, Yong Jiang ", , GengBiao Shen ", Richard Sinnott ", Chen...

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

Describing Data Once we have collected data from surveys or experiments, we need to summarize and present the data in a way that will be meaningful to the reader. We will begin with graphical...

Read the above passage and then answer short questions Summarize and elaborate the research method of this article in concise language Application Research Based on Machine Learning in Network...

Journal Article Review 1. Write Title that reflects the main focus 2. Cite the article 3. Article Identification 4. Introduction 5. Summarize the Article 6. Critique 7. Conclusion The interaction...

A manufacturer is preparing to set the price on a new action game. Demand is thought to depend on the price and is represented by the model: D = 2,000 3.5P The accounting department estimates that...

Depreciation Changes On January 1, 2006, McElroy Company purchased a building and equipment that have the following useful lives, salvage value s, and costs. Building, 40-year estimated useful life,...

Lucknas hardware received R 1 5 0 0 0 for services to be provided over the next 1 0 months. The accountant recorded the R 1 5 0 0 0 as revenue when it was received. By year end R 3 0 0 0 worth of...

12. Disk drives encore For the hard drive data of Exercise 6, find and interpret the value of R2 .

What is the overall purpose of utility theory?

Briefly discuss how a utility function can be assessed. What is a standard gamble, and how is it used in determining utility values?

How is a utility curve used in selecting the best decision for a particular problem?

For the cash flows shown, determine: ( a ) the number of possible i * * values ( b ) the i * * value displayed by the IRR function ( c ) the external rate of return using the MIRR method if i i = 1 8...

Recently you have joined as Quality Manger of a large Highrise building project. For construction purpose your project uses a Pre-caste concrete blocks of different shapes and sizes. After taking...

For many years businesses have struggled with the rising cost of health care. But recently, the increases have slowed due to less inflation in health care prices and employees paying for a larger...