Probem 1. (10 points) Consider the hyperplane wx+b=0 of known parameters w and b. a) (5...

Fantastic news! We've Found the answer you've been seeking!

Question:

Transcribed Image Text:

Probem 1. (10 points) Consider the hyperplane wx+b=0 of known parameters w and b. a) (5 points) Use Lagrangian optimization to find the point xo in the plane closest to the origin, as a function of w and b. b) (5 points) Use the result of a) to derive a geometric interpretation for the parameters w and b, when ||w|| = 1. Problem 2. (10 points) The entropy of probability density function (pdf) p(z) is defined as - p(z) log p(x)dr. r) h(x) = a) (5 points) Use Lagrangian optimization to find the pdf of maximal entropy among those whose expected value is and variance is o. b) (5 points) Find the values of the Lagrange multipliers associated with the optimal solution. 1 Problem 3. (15 points) Extensive experimental evidence in the area of image compression has shown that the Discrete Cosine Transform (DCT) of image patches is a very good approximation to their PCA. It is also well known that all but one of the DCT coefficients (features) have zero mean, and only one has non-zero mean. The latter is the so-called DC coefficient because it results from projecting the image patch into the vector 1 = (1,1,...,1), where d is the vector dimension and, therefore, is proportional to the average (DC) value of the patch. In this problem, we are going to explore the connection between the DCT and PCA to explain this fact. For this, we are going to assume the following. An image patch is a collection of random variables X = {X, X64} which are identically distributed Px. (x) = f(x), Vi {1,...,64}, where f(x) is a common probability density function. The pixels in the image patch are correlated, according to the correlation coefficient E[XX] E[X]VE[X] This obviously implies that we do not have an iid sample. Pij a) (5 points) Consider the PCA of X. Show that it is not affected by a change of variables of the type Z = X - 2, where = E[X]. b) (5 points) Given a), we can assume that X has zero mean. We will do so for the remainder of the problem. Show that in the extreme of highly correlated pixel values, i.e. when Pij 1, Vij the vector 1 is the largest principal component. Since neighboring image pixels do tend to be highly correlated, this helps explain why the DC coefficient is always present. c) (5 points) Let be the matrix whose columns ; are the principal components and consider the set of coefficients (features) Z resulting from the projection of an arbitrary image patch X into these components, i.e. z = $x. Consider the DCT coefficients z; = ofx, noting that 2 = E[zi] = 0, Vi > 1, i.e. that the remaining (AC) coefficients have zero mean. = 1Tx is the DC coefficient. Show that Problem 4. (30 points) Consider a classification problem with two Gaussian classes PX|Y (xi) = G(x, i, ;), i {1,2} and class probabilities a) (5 points) The random variable X is not Gaussian, but we can still compute its mean and covariance. Show that = = 102 H=EX] = + f] . - ((X - .)(X - pa)|- -|. + 2] + +7 ( ) ( ). b) (5 points) In the remainder of this problem, we consider the 2-dimensional case with -[8], [2]. = = r = Using MATLAB, sample 1,000 points from the two Gaussian classes and make a plot of the sampled points for each of the following conditions: and where a > 0, and Py(1) Py (2) = 1/2. condition A: a = 10,0 = 2; condition B: a = 2,0 = 10. f-f=f= c) (5 points) Use MATLAB to perform a principal component analysis for the random variable X. Determine the direction of the best 1-dimensional subspace, i.e. the transformation z = 6x, where is the principal component of largest variance. Hand in a plot containing both the datapoints and this principal component for each of the conditions of b). d) (5 points) Repeat c), but now for the 1-dimensional subspace spanned by the linear discriminant o produced by LDA. e) (5 points) From the results above, is the PCA approach to dimensionality reduction always a good one from the classification point of view? How does it compare to LDA? f) (5 points) We now pursue a theoretical explanation for the observations above. Using the results of a), derive the principal component and the linear discriminant d' as functions of the parameters o and o. From a classification point of view, under what conditions is it 1) better to use PCA, 2) better to use LDA, or 3) identical to use any of the two? 3 Page < 3 of 4 ZOOM + Problem 5. (35 points) In this problem, we consider a dataset with faces from 6 people. These were divided into two sets. A training set is provided in the file trainset.zip and a test set in the file testset.zip. The original colored images of 150 x 150 pixels were converted to 50 x 50 grayscale pixels, resulting in a vector space of 2,500 dimensions. a) (10 points) Using the training data, compute the PCA of the data and make a 4 x 4 plot showing the 16 principal components of largest variance. b) (10 points) Consider the 15 different pairs of classes that can be derived from the 6 face classes. Perform an LDA between the classes in each pair. Make a 4x4 plot containing the 15 linear discriminants (plus one empty plot). (Note: You will likely realize that the within scatter matrix is singular. To address this, use an RDA of regularization parameter y = 1 instead of the LDA.) c) (5 points) For each image x in the training dataset, compute the projection vector px, where the rows of are the 15 principal components of largest variance of the face training data. Note that this produces a datasets (z;} of 15 dimensional vectors. Use the PCA projection vectors z, of the training images to learn a mean , and variance ; per class. Use these to implement the Gaussian classifier. Compute the PCA projections of the test images and use the Gaussian classifier above to classify them. Compute the average probability of classification error for each face class and the average error accross all classes. d) (5 points) Repeat c) using LDA projections z,, instead of PCA projections. (Note: You will likely realize that the within scatter matrix is singular. To address this, use an RDA of regularization parameter y=1 instead of the LDA.) e) (5 points) A popular alternative to RDA, when LDA requires the inversion of singular scatter matrix, consists of first applying a PCA to an intermediate dimension k and then apply LDA to the resulting k-dimensional vectors. This approach is denoted as PCA+LDA. Repeat d) using a PCA+LDA with k 30. Note: For a random variable X with multiple Gaussian classes of mean , and covariance E, the Gaussian classifier has decision rule i* = arg min (x-)(x ) + log |.|. Given a training set {x,...,xn) for class i, the class mean and variance are estimated with H k=1 = [(xx-) (xx-). 12 4 Page < 4 of 4 ZOOM + Probem 1. (10 points) Consider the hyperplane wx+b=0 of known parameters w and b. a) (5 points) Use Lagrangian optimization to find the point xo in the plane closest to the origin, as a function of w and b. b) (5 points) Use the result of a) to derive a geometric interpretation for the parameters w and b, when ||w|| = 1. Problem 2. (10 points) The entropy of probability density function (pdf) p(z) is defined as - p(z) log p(x)dr. r) h(x) = a) (5 points) Use Lagrangian optimization to find the pdf of maximal entropy among those whose expected value is and variance is o. b) (5 points) Find the values of the Lagrange multipliers associated with the optimal solution. 1 Problem 3. (15 points) Extensive experimental evidence in the area of image compression has shown that the Discrete Cosine Transform (DCT) of image patches is a very good approximation to their PCA. It is also well known that all but one of the DCT coefficients (features) have zero mean, and only one has non-zero mean. The latter is the so-called DC coefficient because it results from projecting the image patch into the vector 1 = (1,1,...,1), where d is the vector dimension and, therefore, is proportional to the average (DC) value of the patch. In this problem, we are going to explore the connection between the DCT and PCA to explain this fact. For this, we are going to assume the following. An image patch is a collection of random variables X = {X, X64} which are identically distributed Px. (x) = f(x), Vi {1,...,64}, where f(x) is a common probability density function. The pixels in the image patch are correlated, according to the correlation coefficient E[XX] E[X]VE[X] This obviously implies that we do not have an iid sample. Pij a) (5 points) Consider the PCA of X. Show that it is not affected by a change of variables of the type Z = X - 2, where = E[X]. b) (5 points) Given a), we can assume that X has zero mean. We will do so for the remainder of the problem. Show that in the extreme of highly correlated pixel values, i.e. when Pij 1, Vij the vector 1 is the largest principal component. Since neighboring image pixels do tend to be highly correlated, this helps explain why the DC coefficient is always present. c) (5 points) Let be the matrix whose columns ; are the principal components and consider the set of coefficients (features) Z resulting from the projection of an arbitrary image patch X into these components, i.e. z = $x. Consider the DCT coefficients z; = ofx, noting that 2 = E[zi] = 0, Vi > 1, i.e. that the remaining (AC) coefficients have zero mean. = 1Tx is the DC coefficient. Show that Problem 4. (30 points) Consider a classification problem with two Gaussian classes PX|Y (xi) = G(x, i, ;), i {1,2} and class probabilities a) (5 points) The random variable X is not Gaussian, but we can still compute its mean and covariance. Show that = = 102 H=EX] = + f] . - ((X - .)(X - pa)|- -|. + 2] + +7 ( ) ( ). b) (5 points) In the remainder of this problem, we consider the 2-dimensional case with -[8], [2]. = = r = Using MATLAB, sample 1,000 points from the two Gaussian classes and make a plot of the sampled points for each of the following conditions: and where a > 0, and Py(1) Py (2) = 1/2. condition A: a = 10,0 = 2; condition B: a = 2,0 = 10. f-f=f= c) (5 points) Use MATLAB to perform a principal component analysis for the random variable X. Determine the direction of the best 1-dimensional subspace, i.e. the transformation z = 6x, where is the principal component of largest variance. Hand in a plot containing both the datapoints and this principal component for each of the conditions of b). d) (5 points) Repeat c), but now for the 1-dimensional subspace spanned by the linear discriminant o produced by LDA. e) (5 points) From the results above, is the PCA approach to dimensionality reduction always a good one from the classification point of view? How does it compare to LDA? f) (5 points) We now pursue a theoretical explanation for the observations above. Using the results of a), derive the principal component and the linear discriminant d' as functions of the parameters o and o. From a classification point of view, under what conditions is it 1) better to use PCA, 2) better to use LDA, or 3) identical to use any of the two? 3 Page < 3 of 4 ZOOM + Problem 5. (35 points) In this problem, we consider a dataset with faces from 6 people. These were divided into two sets. A training set is provided in the file trainset.zip and a test set in the file testset.zip. The original colored images of 150 x 150 pixels were converted to 50 x 50 grayscale pixels, resulting in a vector space of 2,500 dimensions. a) (10 points) Using the training data, compute the PCA of the data and make a 4 x 4 plot showing the 16 principal components of largest variance. b) (10 points) Consider the 15 different pairs of classes that can be derived from the 6 face classes. Perform an LDA between the classes in each pair. Make a 4x4 plot containing the 15 linear discriminants (plus one empty plot). (Note: You will likely realize that the within scatter matrix is singular. To address this, use an RDA of regularization parameter y = 1 instead of the LDA.) c) (5 points) For each image x in the training dataset, compute the projection vector px, where the rows of are the 15 principal components of largest variance of the face training data. Note that this produces a datasets (z;} of 15 dimensional vectors. Use the PCA projection vectors z, of the training images to learn a mean , and variance ; per class. Use these to implement the Gaussian classifier. Compute the PCA projections of the test images and use the Gaussian classifier above to classify them. Compute the average probability of classification error for each face class and the average error accross all classes. d) (5 points) Repeat c) using LDA projections z,, instead of PCA projections. (Note: You will likely realize that the within scatter matrix is singular. To address this, use an RDA of regularization parameter y=1 instead of the LDA.) e) (5 points) A popular alternative to RDA, when LDA requires the inversion of singular scatter matrix, consists of first applying a PCA to an intermediate dimension k and then apply LDA to the resulting k-dimensional vectors. This approach is denoted as PCA+LDA. Repeat d) using a PCA+LDA with k 30. Note: For a random variable X with multiple Gaussian classes of mean , and covariance E, the Gaussian classifier has decision rule i* = arg min (x-)(x ) + log |.|. Given a training set {x,...,xn) for class i, the class mean and variance are estimated with H k=1 = [(xx-) (xx-). 12 4 Page < 4 of 4 ZOOM +

Related Book For answer-question

answer-question

Human Resource Management

Human Resource Management

ISBN: 978-0073530550

10th edition

Authors: Lloyd Byars, Leslie Rue

See More Books

Posted Date: Feb 28, 2024 12:57 AM