In projection-based clustering, we note that, if the data is well separated into clusters with means...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In projection-based clustering, we note that, if the data is well separated into clusters with means μ₁,...,K, then the top K eigenvectors of the data covariance matrix, say (v₁,...,UK), tend to align with the Span(μ₁,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(Xį, X₁) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v₁,..., vk] € RdxK, the matrix of top principal directions ||v₁|| = 1. Consider two possible transformations of the data: (i) Xį ← VTX₁, and (ii) Xį ← VVTX₁ Would the clusters learnt from the data Ã; differ from those learnt from the data ¿? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime? In projection-based clustering, we note that, if the data is well separated into clusters with means μ₁,...,K, then the top K eigenvectors of the data covariance matrix, say (v₁,...,UK), tend to align with the Span(μ₁,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(Xį, X₁) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v₁,..., vk] € RdxK, the matrix of top principal directions ||v₁|| = 1. Consider two possible transformations of the data: (i) Xį ← VTX₁, and (ii) Xį ← VVTX₁ Would the clusters learnt from the data Ã; differ from those learnt from the data ¿? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime?
Expert Answer:
Answer rating: 100% (QA)
Youre right that PCA makes the implicit assumption that the Euclidean distance is the appropriate me... View the full answer
Related Book For
Posted Date:
Students also viewed these accounting questions
-
If you were Bob Stevens, what would you do and why? As the students of Class 35 of the Marberry Executive MBA program straggled into the classroom for their one-day workshop on business ethics, they...
-
If you owned an ad agency, what would you do to attract new business? Be specific.
-
Given that, x=12, y=8 and z=4, what does the condition in the following IF statement evaluate to? IF (x / 3 = = y- z) AND (x + z + y ! = y + 15) THEN DISPLAY " Welcome to Programming Design" ENDIF...
-
The two key principles that form the foundation for an ethical sales presentation are OA) the approach and the close B) setting up the appointment and completing the application C) uncovering needs...
-
What are multivalued attributes, and how can they be handled within the database design?
-
7. Scotty Quadcopters plans to sell a standard quadcopter (toy drone) for $65 and a deluxe quadcopter for $95. Scotty purchases the standard quadcopter for $45 and the deluxe quadcopter for $70....
-
Determine the maximum force \(P\) and the corresponding maximum total strain energy that can be stored in the truss without causing any of the members to have permanent deformation. Each member of...
-
In this mini-case, you will complete the test of details on accounts receivable for the 2019 audit of EarthWear Clothiers, Inc. The principal test of detail involves sending "confirmations" or...
-
You decide to work part-time at a local supermarket. The job pays $15.50 per hour and you work 22 hours per week. Your employer withholds 10% of your gross pay for federal taxes, 7.65% for FICA...
-
Tuckered Outfitters plans to market a custom brand of packaged trail mix. The ingredients for the trail mix will include Raisins, Grain, Chocolate Chips, Peanuts, and Almonds costing, respectively,...
-
Three capacitors are connected as shown to the right. a) Calculate their equivalent capacitance, Ceq. b) Calculate the charge supplied by the 15 V battery. c) Calculate the voltage across and charge...
-
How do you think the company should handle these reports of workplace harassment? What strategy would be most effective in preventing future instances of harassment from happening?
-
Two electrons are separated by \(1.50 \mathrm{~nm}\). What is the magnitude of the electric force each electron exerts on the other? Is this force attractive or repulsive?
-
In this case, there was a sexual harassment policy in place, Shonda was persistent in reporting these incidents, and she was even promoted. What allowed workplace harassment to continue without...
-
Suppose you have three identical metal spheres, A, B, and C. Initially sphere A carries a charge \(q\) and the others are uncharged. Sphere A is brought in contact with sphere B, and then the two are...
-
In what ways did Shonda leverage power and influence tactics to combat the toxic culture of workplace harassment at Imperium Omni? Were there any other tactics she could have used, and would they...
-
Consider two food dyes one is red and one is blue. a ) Which has a smaller HOMO - LUMO gap? b ) Will this absorb at a higher or lower wavelength? Green Blue Yellow primary primary complementary...
-
Illini Company, Inc. Balance Sheet as of 12/31/20X0 Assets Current Assets: Cash $1,500,000 Accounts receivable, net 18,000 Inventory 50,000 Total current assets 1,568,000 Equipment 90,000 Goodwill...
-
The user interface was to be written in either C or C++ to allow command-line arguments. Thus, this subsystem is very easy to design. It has two major features: interfacing with the operating system...
-
Suppose that we wished to add a graphical user interface to the major software engineering project. An electronic list of such tools can be found at the URL http://www...
-
Consider the requirements for our continuing software project as they were developed in Section 3.18 and Summary of this chapter. Apply the suggestions in this chapter to reorganize the requirements...
-
What do you mean by phase of a system?
-
Explain the term Energy. Discuss its various forms.
-
Define work. Show that work is a path function.
Study smarter with the SolutionInn App