Question: Data Mining Questions using numpy library I can figure out the first part alright, but I get lost on computing the variance of the reduced
Data Mining Questions using numpy library
I can figure out the first part alright, but I get lost on computing the variance of the reduced dimensionality in the second part, once I figure that out the rest seems like smooth sailing.
1: Write a function to Compute the sample covariance matrix as inner products between the columns of the centered data matrix (see equation 2.31). Show that the result from your function matches the one using numpy.cov function. (Note: Numpy Cov function has a parameter bias, which you should set to True).
2: Use linalg.eig to find the first two dominant eigenvectors of the covariance matrix (that you obtained in Question 1), and reduce the data dimensionality from 10 to 2 by computing the projection of data points along these eigenvectors. Now, compute the variance of the datapoints (in reduced dimensionality) using the subroutine that you wrote for Question 1 (Do not print the projected datapoints on stdout, only print the value of the variance. Also, confirm that the variance is equal to the sum of two dominant eigenvalues.)
3: For the same eigenvectors as Question 2, compute a projection matrix which project the datapoints into a subspace spanned by these two eigenvectors and use it to obtain the co-ordinates of the projected datapoints in the standard basis (in this case your data-points are still a 10-dimensional vectors, but the actually lie in a 2D subspace inside the 10 dimensional space.). Compute the error and show that the error value is equal to the sum of the remaining 8 eigenvectors of the covariance matrix.
4: Use linalg.eig to find all the eigenvectors, and print the covariance matrix in its eigendecomposition form (UU T )
5: Write a subroutine to implement PCA Algorithm (Algorithm 7.1, Page 198).
6: Use the program above and find the principle vectors that we need to preserve 90% of variance? Print the co-ordinate of the first 10 data points by using the above set of vectors as the new basis vector.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
