Question: Data Mining Questions using numpy library I can figure out the first part alright, but I get lost on computing the variance of the reduced

Data Mining Questions using numpy library

I can figure out the first part alright, but I get lost on computing the variance of the reduced dimensionality in the second part, once I figure that out the rest seems like smooth sailing.

1: Write a function to Compute the sample covariance matrix as inner products between the columns of the centered data matrix (see equation 2.31). Show that the result from your function matches the one using numpy.cov function. (Note: Numpy Cov function has a parameter bias, which you should set to True).

2: Use linalg.eig to find the first two dominant eigenvectors of the covariance matrix (that you obtained in Question 1), and reduce the data dimensionality from 10 to 2 by computing the projection of data points along these eigenvectors. Now, compute the variance of the datapoints (in reduced dimensionality) using the subroutine that you wrote for Question 1 (Do not print the projected datapoints on stdout, only print the value of the variance. Also, confirm that the variance is equal to the sum of two dominant eigenvalues.)

3: For the same eigenvectors as Question 2, compute a projection matrix which project the datapoints into a subspace spanned by these two eigenvectors and use it to obtain the co-ordinates of the projected datapoints in the standard basis (in this case your data-points are still a 10-dimensional vectors, but the actually lie in a 2D subspace inside the 10 dimensional space.). Compute the error and show that the error value is equal to the sum of the remaining 8 eigenvectors of the covariance matrix.

4: Use linalg.eig to find all the eigenvectors, and print the covariance matrix in its eigendecomposition form (UU T )

5: Write a subroutine to implement PCA Algorithm (Algorithm 7.1, Page 198).

6: Use the program above and find the principle vectors that we need to preserve 90% of variance? Print the co-ordinate of the first 10 data points by using the above set of vectors as the new basis vector.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

Due Sunday, 11/20: Problems from Zimmerman text: 2-11, 2-44, Case 2-3, 7-7, and 7-20 Chapter Two The Nature of Costs Chapter Outline A. Opportunity Costs 1. Characteristics of Opportunity Costs 2....

I need a 10 page paper for my MIS class. Please do not copy and paste as my school is getting stricter on plagiarism. I have attached the assignment and the sample \fData Analytic Thinking 1 Data...

i just need to complete the excel Frenchies Signature Assignment Joseph Preimesberger posted on Nov 8, 2021 9:35 AM Edited This is a shortened assignment, so just complete the three Excel tabs given...

1 Exercise 3: Lift and Airfoils The first part of this week's assignment is to choose and research a reciprocating engine powered (i.e. propeller type) aircraft. You will further use your selected...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Building Sustainable Organizations: The Human Factor Author(s): Jeffrey Pfeffer Source: Academy of Management Perspectives, Vol. 24, No. 1 (February 2010), pp. 34-45 Published by: Academy of...

Supply Chain Management Introduction Outline What is supply chain management? Significance of supply chain management. Push vs. Pull processes utdallas.edu/~metin 1 A Generic Supply Chain Sources:...

ITM 309: Business Information Technology and Systems Spring 2016 Watson and the new era of cognitive systems Jerry Haan IBM Cloud Ecosystem Development January 27, 2016 2013 International Business...

Coding Language c++ Huffman Encoding Using the Huffman encoding algorithm as explained in class, encode and decode the Speech.txt file using frequency tree and priority queue. Implement Huffman style...

Koyna Jal manufactures and sells a special energy drink for sports players. The organization follows a policy of charging 20% profit on its selling price, and has estimates of sales revenue of Rs....

In the SohnCo Baby Products Division plant located in New Brunswick, NJ, one Baby Mobile is assembled from the following: A36 4 B22 2 C35 1 A B22 requires 1 C7 and a purchased D 19. All A parts are...

The following analysis is derived from the research and development ( R&D ) section of KAZ Sdn . Bhd . ( KSB ) . A lathe used for the purpose of cutting molded plastics was acquired 1 0 years ago for...

You have just been hired as a loan officer at Fairfield State Bank. Your supervisor has given you a file containing a request from Hedrick Company, a manufacturer of auto components, for a $1,000,000...

KEY QUESTION Briefly discuss the major causes of income inequality. With respect to income inequality, is there any difference between inheriting property and inheriting a high IQ? Explain.

LAST WORD Go to Table 1 in the Last Word and compute the ratio of average wealth to median wealth for each of the 4 years. What trend do you find? What is your explanation for the trend? The Federal...

KEY QUESTION What are the estimated income and price elasticities of demand for health care? How does each relate to rising health care costs?