Question: 2 Problem 2(a): Expectation Maximization for Factor Analysis Generative model for Factor Analysis We consider a simple linear Gaussian example, where the data points Xt

2 Problem 2(a): Expectation Maximization for Factor Analysis Generative model for Factor Analysis We consider a simple linear Gaussian example, where the data

2 Problem 2(a): Expectation Maximization for Factor Analysis Generative model for Factor Analysis We consider a simple linear Gaussian example, where the data points Xt E RDX1 data is defined as: are i.i.d., with t E {1, T}. The generative model of the X4 = where CER DXK Czt + Et is a matrix mapping from the latent space to the observation space; Z4 ERKxl, ze N(z|0,Q); and et ~ N(e|0, R) is the noise term in the observation space (we'll assume for now that the data is mean-centered and omit a bias term to keep the exposition cleaner). What type of statistical structure does this model enforce on the data? If we calculate the mean and variance of the observations we find: E[xt] = 0 Cov[xt] = CQCT + R Because both the latent variables and the observation noise are Gaussian, the resulting Xt will also be Gaussian, with a mean and covariance given by the above equations. We will now make two more assumptions: first, without loss of generality, we will set Q to be equal to the identity matrix; second, we will constrain R to be diagonal. We can see from the equations above that the resulting model, which is referred to as Factor Analysis (FA), models the covariance matrix of a high-dimensional Gaussian as a low-rank matrix plus a diagonal matrix (where the rank, equal to the number of latent variables, is a hyperparameter of the model). Question 3: (E-step) Given Rand C, use Bayes rule to calculate the posterior distribution of z; p(z+\xt). Try to simplify the expression as much as possible. Answer below this line 2 Problem 2(a): Expectation Maximization for Factor Analysis Generative model for Factor Analysis We consider a simple linear Gaussian example, where the data points Xt E RDX1 data is defined as: are i.i.d., with t E {1, T}. The generative model of the X4 = where CER DXK Czt + Et is a matrix mapping from the latent space to the observation space; Z4 ERKxl, ze N(z|0,Q); and et ~ N(e|0, R) is the noise term in the observation space (we'll assume for now that the data is mean-centered and omit a bias term to keep the exposition cleaner). What type of statistical structure does this model enforce on the data? If we calculate the mean and variance of the observations we find: E[xt] = 0 Cov[xt] = CQCT + R Because both the latent variables and the observation noise are Gaussian, the resulting Xt will also be Gaussian, with a mean and covariance given by the above equations. We will now make two more assumptions: first, without loss of generality, we will set Q to be equal to the identity matrix; second, we will constrain R to be diagonal. We can see from the equations above that the resulting model, which is referred to as Factor Analysis (FA), models the covariance matrix of a high-dimensional Gaussian as a low-rank matrix plus a diagonal matrix (where the rank, equal to the number of latent variables, is a hyperparameter of the model). Question 3: (E-step) Given Rand C, use Bayes rule to calculate the posterior distribution of z; p(z+\xt). Try to simplify the expression as much as possible. Answer below this line

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

RS600: Unit 5 Exercise Business Research Methods, Chapters 19 - 24 Reading Assignment Questions Theresa Bubak Name: Answer the following questions with a three to five sentence response that captures...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

In a Hopfield neural network configured as an associative memory, with all of its weights trained and fixed, what three possible behaviours may occur over time in configuration space as the net...

Code the function greedy_predicator without using numpy/pandas Please include explanation of the code & the computational complexity To see the description of the function: Scroll down the...

In this question you will be asked to reflect on a project you have been involved in or observed, in which a design evolved, or could have evolved, through applying a theory of user behaviour. You...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

Submitted to Management Science manuscript MS-0001-1922.65 Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title....

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Consider the trigonometric series a0 2 + X r=1 (ar cos rx + br sin rx) where a0, a1, a2, . . . and b1, b2, . . . are constants and suppose that f(x) is a periodic function of x with period 2. (a)...

What are the Statistics Mentioned? Sample Size (n): Number of Groups: Number of Treatments: Standard Deviation (): Standard Error: Mean (): F Statistic: Degrees of Freedom (df): P-Value: An ANOVA...

You pay federal income taxes at a 28 percent marginal tax rate. You have the choice of buying either a taxable corporate bond paying 7.10 percent coupon interest or a similar maturity and risk...

A point charge q is located at the centre O of a spherical uncharged conducting layer provided with a small orifice (Fig. 3.32). The inside and outside radii of the layer are equal to a and b...

We have dipcovered the existence of trichlorcettylene in the ground water at our Souetiake, Texas follity Horizontal delineation concentrations in excess of applicable nesidential Assessmert livels...

recommend on demographic and geographic market segments , innovation(s) that will add more value to the segment needs and to the township company's brand perception. Elaborate fully on the type of...

10. What, if any, are the benefits and costs of economic growth, particularly as measured by real GDP per capita? LO28.6

6. Suppose that just by doubling the amount of output that it produces each year, a firms per-unit production costs fall by 30 percent. This is an example of: LO28.4 a. Economies of scale. b....

5. Real GDP equals _________ times _________. LO28.4 a. Average hours of work; quantity of capital. b. Average hours of work; allocative efficiency. c. Labor input; labor productivity. d. Natural...