Challenge Problems C1. Johnson-Lindenstrauss for Clustering As discussed in class, one of the most popular clustering...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Challenge Problems C1. Johnson-Lindenstrauss for Clustering As discussed in class, one of the most popular clustering objectives is k-means clustering. Given a set of n data points X {x, ..., xn} in Rd, the goal is to partition [n] into k sets (clusters) C = {C,... Ck} minimizing: = k cost (C, X) = ||xi - Mj||/2 j=1 iCj Liec, xi is the centroid of cluster C; (i.e., the mean of the points in that cluster.) where = 1. Prove the fact discussed in class that cost (C) can be equivalently written as: k 1 cost(C, X) = 2|C| j=1 ||xia - Xill3. in EC'j iz ECj (1) Hint: Show that both (1) and (2) can be rewritten as =1 [(Ciec; ||xi||2) |C;| ||;||2]. k 2. Suppose that II Rmxd is a random projection matrix with each entry chosen independently as N(0, 1/m). For each x; in the dataset, let x = IIx, and let X = {x,...,xn} denote our set of sketched data points in Rm. Conclude from part (1) that if m= 0 (log(n/5)), then with probability 1-6, for every possible clustering C, (1 ) cost(C, X) cost(C, X) (1 + ) cost(C, X). Hint: To keep your calculations simple, you can use the version of the JL Lemma which says that all squared norms are preserved. I.e., that for all x, x; in our input set, (1-)||xi-xj|| |||IIx; IIx;|| (1 + )||x; xj||2. This is what we actually proved in class and directly implies the version with unsquared norms by taking a square root. 3. Assuming that the guarantee of part (2) holds for some < 1/2, prove that if C is an optimal clustering for the compressed dataset X, i.e., cost(, X) = minc cost(C, X), then: cost (, X) (1+ 4) min cost (C, X). That is, is a near optimal clustering for the original dataset X. Hint: You will have to apply the guarantee of part (2) to two different clusterings in the course of the proof. You may want to recall from Problem Set 1 that for any x (0, 1/2), 1+ 2x. Challenge Problems C1. Johnson-Lindenstrauss for Clustering As discussed in class, one of the most popular clustering objectives is k-means clustering. Given a set of n data points X {x, ..., xn} in Rd, the goal is to partition [n] into k sets (clusters) C = {C,... Ck} minimizing: = k cost (C, X) = ||xi - Mj||/2 j=1 iCj Liec, xi is the centroid of cluster C; (i.e., the mean of the points in that cluster.) where = 1. Prove the fact discussed in class that cost (C) can be equivalently written as: k 1 cost(C, X) = 2|C| j=1 ||xia - Xill3. in EC'j iz ECj (1) Hint: Show that both (1) and (2) can be rewritten as =1 [(Ciec; ||xi||2) |C;| ||;||2]. k 2. Suppose that II Rmxd is a random projection matrix with each entry chosen independently as N(0, 1/m). For each x; in the dataset, let x = IIx, and let X = {x,...,xn} denote our set of sketched data points in Rm. Conclude from part (1) that if m= 0 (log(n/5)), then with probability 1-6, for every possible clustering C, (1 ) cost(C, X) cost(C, X) (1 + ) cost(C, X). Hint: To keep your calculations simple, you can use the version of the JL Lemma which says that all squared norms are preserved. I.e., that for all x, x; in our input set, (1-)||xi-xj|| |||IIx; IIx;|| (1 + )||x; xj||2. This is what we actually proved in class and directly implies the version with unsquared norms by taking a square root. 3. Assuming that the guarantee of part (2) holds for some < 1/2, prove that if C is an optimal clustering for the compressed dataset X, i.e., cost(, X) = minc cost(C, X), then: cost (, X) (1+ 4) min cost (C, X). That is, is a near optimal clustering for the original dataset X. Hint: You will have to apply the guarantee of part (2) to two different clusterings in the course of the proof. You may want to recall from Problem Set 1 that for any x (0, 1/2), 1+ 2x.
Expert Answer:
Related Book For
International Marketing And Export Management
ISBN: 9781292016924
8th Edition
Authors: Gerald Albaum , Alexander Josiassen , Edwin Duerr
Posted Date:
Students also viewed these databases questions
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
This case was written by Professor Michele Greenwald, Visiting Professor of Marketing at HEC Paris, for use with Advertising and Promotion: An Integrated Marketing Communications Perspective 7th...
-
Factor completely. p(p + 2) + p(p + 2) - 6(p + 2)
-
Latoya Company provides the following selected information related to its defined benefit pension plan for 2019. Pension asset/liability (January 1)...............................................$...
-
Presented below are data taken from the records of Alee Company. Additional information: 1. Held-to-maturity securities carried at a cost of $43,000 on December 31, 2013, were sold in 2014 for...
-
As the statistician on a multidisciplinary research team, you are asked to create a randomization scheme to assign three treatments A, B, and $\mathrm{C}$, to 150 units in blocks of 6 . The scheme...
-
Continue with the facts of Problem 57. What are the Federal income tax withholding requirements with respect to Martinho's sale? Who pays the withheld amount to the U.S. Treasury?
-
6 a. Show by induction on n that 12 +22 +32 +...+n = n(n+1)(2n+1) b. Let T(n) be defined recursively as follows: T(1) = c, and T(n) = 5T (#)+c vn5 C. where c is an arbitrary constant, and n = 5k for...
-
Your task is to build a JEE web application that uses Enterprise JavaBeans (EJB) and Java Persistence API (JPA) to manage the persistence of Java objects to a relational database. The application...
-
Bemma plc (B) wishes to sell some land to Ayunfei plc (A). Both A and B each have four possible negotiating strategies available to them. The price which A will pay for the land (M) is dependent upon...
-
UMPI, Inc. purchased a magic pencil making machine for $735,000 on September 1 by paying $107,000 in cash and placing the remainder on account. They signed an interest-only note agreement with...
-
Professional standards indicate that the integrity and business reputation of the client and the firm's ability to adequately perform the engagement with an appropriate level of professional...
-
Examine the following table: Value of Exports (in billions) Value of Imports (in billions) Balance Goods $100 $200 -$100 Services $150 $100 $50 Income Receipts and Payments $300 $200 $100 Unilateral...
-
How does the use of artificial intelligence (AI) and machine learning (ML) in the creation of intellectual property impact traditional notions of copyright and patent law?
-
Explain how should Samra handle the situation with Ms. McConnley? 2. When Ms. McConnley is ready to check out, she informs Samra that she does not have any cash to pay her $25 copayment. How should...
-
Territory developers may use a number of bases to develop Territories in a systematic way. You are required to explain the following popular bases which are the key bases for territory development:...
-
In Problem use geometric formulas to find the unsigned area between the graph of y = f(x) and the x axis over the indicated interval. f(x) = x + 5; [0, 4]
-
EFI Logistics, headquartered in San Francisco, is a company providing a wide range of services to exporters and importers. In addition to performing the traditional foreign freight-forwarding and...
-
Introduction Dell, Inc. was the worlds market leader in personal computers using direct sales through the Internet and over the telephone until 2005. In 2006, it experienced an unexpected decline in...
-
Under what conditions would a strategy of multiple entry modes be most appropriate and under what conditions would it be inappropriate? Discuss.
-
Do you believe that fraud in the United States is more or less prevalent than fraud in countries outside the United States? Why?
-
How would a fraudster conceal missing cash, especially amounts of this magnitude?
-
What makes channel stuffing illegal by comparison to a car dealership that runs mega end-of-year sales?
Study smarter with the SolutionInn App