In projection-based clustering, we note that, if the data is well separated into clusters with means...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In projection-based clustering, we note that, if the data is well separated into clusters with means μ₁,...,K, then the top K eigenvectors of the data covariance matrix, say (v₁,...,UK), tend to align with the Span(μ₁,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(Xį, X₁) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v₁,..., vk] € RdxK, the matrix of top principal directions ||v₁|| = 1. Consider two possible transformations of the data: (i) Xį ← VTX₁, and (ii) Xį ← VVTX₁ Would the clusters learnt from the data Ã; differ from those learnt from the data ¿? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime? In projection-based clustering, we note that, if the data is well separated into clusters with means μ₁,...,K, then the top K eigenvectors of the data covariance matrix, say (v₁,...,UK), tend to align with the Span(μ₁,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(Xį, X₁) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v₁,..., vk] € RdxK, the matrix of top principal directions ||v₁|| = 1. Consider two possible transformations of the data: (i) Xį ← VTX₁, and (ii) Xį ← VVTX₁ Would the clusters learnt from the data Ã; differ from those learnt from the data ¿? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime?
Expert Answer:
Answer rating: 100% (QA)
Youre right that PCA makes the implicit assumption that the Euclidean distance is the appropriate me... View the full answer
Related Book For
Posted Date:
Students also viewed these accounting questions
-
If you were Bob Stevens, what would you do and why? As the students of Class 35 of the Marberry Executive MBA program straggled into the classroom for their one-day workshop on business ethics, they...
-
If you owned an ad agency, what would you do to attract new business? Be specific.
-
Given that, x=12, y=8 and z=4, what does the condition in the following IF statement evaluate to? IF (x / 3 = = y- z) AND (x + z + y ! = y + 15) THEN DISPLAY " Welcome to Programming Design" ENDIF...
-
The two key principles that form the foundation for an ethical sales presentation are OA) the approach and the close B) setting up the appointment and completing the application C) uncovering needs...
-
How does the auditor evaluate the sufficiency and appropriateness of the evidence gathered?
-
Access the Discovering Data exercise for Chapter 7 Problem 9 online to answer the following questions. a. Which source of government revenue is the largest? How have revenue sources changed over...
-
Better Beds Ltd operates three departments. The decor department has not been performing very well and has shown a loss for the past 3 years according to the companys statement of financial...
-
The Scampini Supplies Company recently purchased a new delivery truck. The new truck cost $22,500, and it is expected to generate net after-tax operating cash flows, including depreciation, of $6,250...
-
Problem 1: Spitz Dairy (Joint Costs) Spitz Dairy produces 3 different products: Butter, Cheese and Whey from Milk. The cost of Milk is $1.5 per gallon and one gallon of Milk yields 0.5 pounds of...
-
What do you understand by transverse and parallel fillet welds? A rectangular cross-section bar is welded to a support by means of fillet welds as shown in Fig. Determine the size of the welds, if...
-
Points P=(1,2,3),Q=(2,2,2) and R=(1,0,?2) lie on a plane ax+by+cz=d in 3-dimensional space. (a) Calculate the displacement vectors PQ and PR. (b) Calculate the cross product vector PQPR. (c)...
-
You are a 1st year chemical engineering student must establish a project for your subject CHEMICAL ENGINEERING CALCULATIONS with the following instructions: 1. Recommend a process to be calculated in...
-
On January 1, 2021, Ackerman sold equipment to Brannigan (a wholly owned subsidiary) for $310,000 in cash. The equipment had originally cost $279,000 but had a book value of only $170,500 when...
-
How do leaders leverage emotional intelligence to inspire and motivate diverse teams, fostering a sense of belonging and purpose amidst dynamic challenges?
-
Whats the role of underwriters and prospectuses in IPO? What is the key difference between IPO and SEO? Why SEO also has the underpricing phenomenon?
-
Compute ending work in process inventory for a manufacturer using the following information. Raw materials purchased Direct materials used Direct labor used Factory overhead Work in process...
-
Alex, a Georgia Tech Freshman, is very entrepreneurial and at the age of 19 already has a thriving, if simple business. Each morning Alex stops by the Krispy Kream Donut factory on Ponce de Leon...
-
Explain the term global capital markets. This chapter primarily discusses global equity markets. What other types of financial instruments are traded in these markets? How important are global...
-
The user interface was to be written in either C or C++ to allow command-line arguments. Thus, this subsystem is very easy to design. It has two major features: interfacing with the operating system...
-
Suppose that we wished to add a graphical user interface to the major software engineering project. An electronic list of such tools can be found at the URL http://www...
-
Consider the requirements for our continuing software project as they were developed in Section 3.18 and Summary of this chapter. Apply the suggestions in this chapter to reorganize the requirements...
-
The following data refer to a compound impulse turbine having two rows of moving blades and one row of fixed blades in between them. Nozzle angle \(=15^{\circ}\), Exit velocity of steam from the...
-
In a stage of impulse-reaction turbine, steam enters with a speed of \(250 \mathrm{~m} / \mathrm{s}\) at an angle of \(30^{\circ}\) in the direction of blade motion. The mean blade speed is \(150...
-
A simple impulse turbine has one ring of moving blades running at \(150 \mathrm{~m} / \mathrm{s}\). The absolute velocity of steam at exit from the stage is \(80 \mathrm{~m} / \mathrm{s}\) at an...
Study smarter with the SolutionInn App