Question: Problem 4 (PROBABILISTIC LATENT VARIABLE MODEL AND ITS RELATION TO PCA) Consider the latent variable model x=Wz+e, (12) where I E Rd, 2 E R9W

Problem 4 (PROBABILISTIC LATENT VARIABLE MODEL AND ITS RELATION TO PCA)

Consider the latent variable model x=Wz+e, (12) where I E Rd, 2

E R9W E Rdxq and e E Rd. The probabilistic model uses

Problem 4 (PROBABILISTIC LATENT VARIABLE MODEL AND ITS RELATION TO PCA) Consider the latent variable model x=Wz+e, (12) where I E Rd, 2 E R9W E Rdxq and e E Rd. The probabilistic model uses z ~ n(0,1) and en(0,0I) and is independent of z. We make N observations of x, i.e., 11, ..., IN, and we do not know the matrix W or the noise variance o?, which we try to use a maximum likelihood estimator for it. (a) Prove that using (12) 2~(0,Ww" +oI). (b) Using (a), show that the log likelihood function L of 21, ..., IN, i.e., log f(x1, ..., IN) is given by c= {d log(27) + log |C| + trace (C5)}, (13) where C=ww+oI, S i=1 0 0 0 0 Hint: Stack up = {2},...,x)" and notice that ~ (0,C), where is a block diagonal matrix: co : and use the property trace(AB) = trace(BA). (c) We can show and you can assume) that ac aw = N(C-sc-w-c-w) (14) Use (14) to show that the optimal solution satisfies: SC-W=W (15) (d) We will use a singular value decomposition form for W as, W=ULVT (16) (17) where U E Rdxq has orthonormal columns, V e R9X9 is orthonormal i.e., VVT = 1 and L = diag(l1,..., lq) represent the singular values. Upon substituting (16) in (15) we can show (and you can assume) that SUL =U (GI+L?) L Use (17) to show that for li #0, Su; = (o? +1})ui, i.e., Ui are eigenvectors of S when l; +0. Note that when l; = 0, we can choose uj arbitrarily. Therefore the question becomes how many such eigenvectors do we choose. Using this, we can show (and you can assume) that W=U,(K, -o-1)*R, (18) where U, E Rdxq has the q eigenvectors of S (not necessarily the largest), R E R9x9 is an arbitrary orthogonal matrix, and kj pg) the corresponding eigenvalue to u; or (19) else where to avoid confusion with ordered eigenvalues, we have used pl.) as some permutation of it. (e) We will next show (and you can assume that the maximum likelihood estimate of W,02 from maximizing (13) is given by WML = U, (1, -o1) R, OL LE.. i=9+1 where U, E Rdxq has the q eigenvectors of S with the largest eigenvalues, 11 > 122 ... > Id, with diagonal matrix A = diag(41, ..., Aq), and R R9X9 is an arbitrary orthogonal matrix, i.e., RRT = 1. Moreover, omL , =2+1 di The expression in (18) when substituted into (13) and maximizing for o2 can be show to be (and you can assume so): - {ruct to their d log(27) + d + log(4p()) + (d - q') log (8-)} (20) N j='+1 Show that since ;=1 log(Ap()) = =, log(1i) = log |S|, we can write (20) as, N C -{dlog(27) + d + log S}- log(16)) + (d - () log ) 2 2 d- j='+1 (21) (f) Therefore the maximum likelihood choice for p (-) is related to minimizing over p(-) and log(Ap()) + log ) (d-9) d-4 j='+1 Show that the optimal choice is d' = q and p(-) chooses the q largest eigenvalues 11, ..., 1g. d d j='+1 (g) Dimensionality reduction: Using (12) it can be shown that and you can assume so): zx~n(M-'W"x, oM-), where M ER9x9 is given by M = ww +oI. Given observations 21,..., ty best estimate of 21, ..., ZN, can be expressed (you can assume this) as i = E[zi|2i]. Using this, find the best estimates for the latent variables z1, ... , zn from observations 21, ..., En. How does this compare to standard PCA and when does it approach standard PCA projection? Problem 4 (PROBABILISTIC LATENT VARIABLE MODEL AND ITS RELATION TO PCA) Consider the latent variable model x=Wz+e, (12) where I E Rd, 2 E R9W E Rdxq and e E Rd. The probabilistic model uses z ~ n(0,1) and en(0,0I) and is independent of z. We make N observations of x, i.e., 11, ..., IN, and we do not know the matrix W or the noise variance o?, which we try to use a maximum likelihood estimator for it. (a) Prove that using (12) 2~(0,Ww" +oI). (b) Using (a), show that the log likelihood function L of 21, ..., IN, i.e., log f(x1, ..., IN) is given by c= {d log(27) + log |C| + trace (C5)}, (13) where C=ww+oI, S i=1 0 0 0 0 Hint: Stack up = {2},...,x)" and notice that ~ (0,C), where is a block diagonal matrix: co : and use the property trace(AB) = trace(BA). (c) We can show and you can assume) that ac aw = N(C-sc-w-c-w) (14) Use (14) to show that the optimal solution satisfies: SC-W=W (15) (d) We will use a singular value decomposition form for W as, W=ULVT (16) (17) where U E Rdxq has orthonormal columns, V e R9X9 is orthonormal i.e., VVT = 1 and L = diag(l1,..., lq) represent the singular values. Upon substituting (16) in (15) we can show (and you can assume) that SUL =U (GI+L?) L Use (17) to show that for li #0, Su; = (o? +1})ui, i.e., Ui are eigenvectors of S when l; +0. Note that when l; = 0, we can choose uj arbitrarily. Therefore the question becomes how many such eigenvectors do we choose. Using this, we can show (and you can assume) that W=U,(K, -o-1)*R, (18) where U, E Rdxq has the q eigenvectors of S (not necessarily the largest), R E R9x9 is an arbitrary orthogonal matrix, and kj pg) the corresponding eigenvalue to u; or (19) else where to avoid confusion with ordered eigenvalues, we have used pl.) as some permutation of it. (e) We will next show (and you can assume that the maximum likelihood estimate of W,02 from maximizing (13) is given by WML = U, (1, -o1) R, OL LE.. i=9+1 where U, E Rdxq has the q eigenvectors of S with the largest eigenvalues, 11 > 122 ... > Id, with diagonal matrix A = diag(41, ..., Aq), and R R9X9 is an arbitrary orthogonal matrix, i.e., RRT = 1. Moreover, omL , =2+1 di The expression in (18) when substituted into (13) and maximizing for o2 can be show to be (and you can assume so): - {ruct to their d log(27) + d + log(4p()) + (d - q') log (8-)} (20) N j='+1 Show that since ;=1 log(Ap()) = =, log(1i) = log |S|, we can write (20) as, N C -{dlog(27) + d + log S}- log(16)) + (d - () log ) 2 2 d- j='+1 (21) (f) Therefore the maximum likelihood choice for p (-) is related to minimizing over p(-) and log(Ap()) + log ) (d-9) d-4 j='+1 Show that the optimal choice is d' = q and p(-) chooses the q largest eigenvalues 11, ..., 1g. d d j='+1 (g) Dimensionality reduction: Using (12) it can be shown that and you can assume so): zx~n(M-'W"x, oM-), where M ER9x9 is given by M = ww +oI. Given observations 21,..., ty best estimate of 21, ..., ZN, can be expressed (you can assume this) as i = E[zi|2i]. Using this, find the best estimates for the latent variables z1, ... , zn from observations 21, ..., En. How does this compare to standard PCA and when does it approach standard PCA projection

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

CS229 Problem Set #3 9 (a) [5 points] Exact marginal inference. By exploiting the linearity of f, we will first show that p(x) can be determined analytically. We shall do so by explicitly finding a...

Critically evaluate the following article. Not just a review or "So and so said...." What do YOU think about them? Do you agree or disagree? Topic of Discussion is The Importance of Rewards and...

Customer Satisfaction Modelling and Analysis: A Case Study. Total Quality Management & Business Excellence, 18(5), 545-554. doi:10.1080/14783360701240337 (attached below) Write a summary analysis and...

Assessment & Evaluation in Higher Education Vol. 29, No. 1, February 2004 Academic procrastination and statistics anxiety Anthony JOnwuegbuzieDepartment of Educational Measurement and Research,...

This chapter presents students and early career executives with a sound understanding of theory. Theory is explored in terms of both anatomy (parts of the whole) and physiology (relationships with...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

1. What is the issue being addressed in the paper? 2. What are the findings of the paper? 3. Why is this paper important to auditors, and what are the implications of this paper for the auditing...

Getting past conflict resolution: A complexity view of conflict E:CO Issue Vol. 10 No. 1 2008 pp. 23-38 Academic Getting Past Conflict Resolution: A Complexity View of Conflict Leticia Andrade1,...

Ryan is self-employed. This year Ryan used his personal auto for several long business trips. Ryan paid $1,870 for gasoline on these trips. His depreciation on the car if he was using it fully for...

Briefly explain the effects of temperature on semiconductor devices?

The master cost of goods sold budget includes which of the following? ASelling and administrative expense budgetBCapital expenditures budgetCDirect labor costs budgetDCash budget

16-1 CASH CONVERSION CYCLE Parramore Corp has $12 million of sales, $3 million of inventories, $3.25 million of receivables, and $1.25 million of payables. Its cost of goods sold is 75% of sales, and...

What is the basis for Security Concerns in Cloud Computing?

Should Needs and GAP Analyses be equally applied in terms of effort when off-theshelf System Solutions being acquired versus building a custom system using Vendors or internal Programming Staff?

Describe the three main Cloud Computing Environments.