Question: CS229 Problem Set #3 9 (a) [5 points] Exact marginal inference. By exploiting the linearity of f, we will first show that p(x) can be

CS229 Problem Set #3 9 (a) [5 points] Exact marginal inference. By exploiting the linearity of f, we will first show that p(x) can be determined analytically. We shall do so by explicitly finding a closed-form expression for p(x). To begin, we make the following observation: letting o ~ N(0, 72Id), we can see that the process for generating x is the same as x = Wz+b+d. (4) Since z and o are Gaussian random variables, x must also be a Gaussian random variable. Thus, p(x) must be a Gaussian distribution N(v, I) for some choice of mean vector v E R and covariance matrix D e Rdxd, where I' is symmetric and positive definite. Task: Express v and I as functions of (W, b, y). You do not need to prove that x is a Gaussian random variable or that p(x) is a Gaussian distribution in your answer; you only need to provide expressions for v and I. Hint: By definition of the Gaussian distribution parameters, v = E[x] and I = cov(x). (b) [5 points] Understanding the ELBO. From Q3a, we see that we can exploit the linearity of f to determine In p(x) analytically. In general, however, exact calculation of Inp(x) is often intractable, and we must develop methods to instead approximate Inp(x). One such method is variational inference, which converts the estimation of In p(x) into an optimization problem by using the Evidence Lower Bound (ELBO), ELBO(x; q) = EzNg In P(2, z) q(z) (5) where q is some choice of distribution over the space of z. Note that in the equation above q(t) denotes the density of q at t, and z ~ q means a random variable z is sampled from the distribution of q. Crucially, as shown in the lecture, the ELBO always lower bounds In p(x), In p(x) > ELBO(x; q) = EzNg In P(2, 2) q(z) (6) no matter what choice of q you use. Since this bound holds for any choice of q, we can approximate In p(x) by optimizing q over some space of distributions 2, In p(x) 2 max ELBO(x; q), (7) so that the ELBO is as large as possible (thus giving the best approximation for In p(x)). We refer to Q as the variational family, and each q E 2 as a proposal or variational distribution. Before we discuss how to optimize the ELBO, let us briefly familiarize ourselves with the ELBO by considering two decompositions of the ELBO that exposes its relation to the Kullback-Leibler (KL) divergence. i. Task: Prove that ELBO(x; q) = Ez~q Inp(x | z) - DKL(q l| Pz). (8) ii. Task: Prove that ELBO(x; q) = Inp(x) - DKL(q || Pzz). (9)3. [45 points] Variational Inference in a Linear Gaussian Model In this problem, we will introduce an algorithm for probabilistic inference in latent variable models. We consider a latent variable model where p(:r, z) = p(z) op(:r | 2:) where z is the latent variable and a: is the observed variable. We are interested in the following probabilistic inference tasks; given a latent variable model and an example as, we wish to determine the marginal distribution 30(3) and the posterior distribution p(z | at). (Broader context: latent variable models have many applications. For example, to model the language, the observed variable a: can be a document, and the latent variable 2 can mean the topic of the document. Computing the posterior distribution in this case corresponds to inferring the topic of the document. Moreover, though in this question we always operate with a given latent variable model, computing the posterior in some latent variable model can also be used as a sub-procedure for learning the latent variable model (as in the EM algorithm). See the remark at the end of the question as well.) Specically, we will introduce and study a particular approximate inference algorithm: Stochastic Gradient Variational Bayes (SGVB), which is closely related to the EM algorithm introduced in the lectures.2 Concretely, consider a latent variable model with latent variables 2: E Em and observed variables a: 6 Rd, drawn according to 2: ~ N(0,Im) (1) 3!: I z N N(f(z)1T2Id)= (2) where \"y > 0, Im and Id are identity matrices of sizes m x m and d x (1 respectively, and f : Rm ) Rd maps 2 to the mean of the conditional Gaussian distribution of a: given 2.3 In the subsequent text, we shall refer to the distributions in Eqs. (1) and (2) simply as 30(3) and p(a: | z) respectively. When f is chosen to be a deep neural network, the resulting \"deep\" latent variable model is capable of modeling highly complex distributions over 13.4 In this problem, however, we shall consider the simplied case of \"Linear Gaussian Model\

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

2 CS229 Problem Set #4 Solutions log m ! p(x(i) |)p() = log p() + m " log p(x(i) |) i=1 i=1 = log p() + m " log " p(x(i) , z (i) |) log " Qi (z (i) ) i=1 = log p() + log p() + m " z (i) i=1 z (i) m...

BA 1605: Midterm Recap (Due: Feb. 27, 2015) Name _____________________________ 50 Student ID _____________________________ Section 01B 10:00~11:20 am Section 02B 01:00~02:20 pm [Questions 4 ~ 7] The...

Solving Two-stage Robust Optimization Problems by A Constraint-and-Column Generation Method Bo Zeng Department of Industrial and Management Systems Engineering University of South Florida, Email:...

Attempt the following please; Univariate unconstrained maximization. (10 points) Consider the following maximization problem: max x f (x; x0) = exp((x x0)2) 1. Write down the first order conditions...

I need your help. This HM is due tomorrow at 4:00pm Spring 2016 - EBF 304W Homework 2 Due: 5:00pm on Friday, 12 February 2016 (via ANGEL) 50 points Instructions: Please answer all questions clearly...

.Solve the following questions A firm uses two inputs, X and Y and its production function is Q = %(xy), where here we are using x and y to represent the quantities of the two inputs. (a) Calculate...

BE562: Problem Set 3 Fall 2016 Due 10/21/2016 8PM 1. Hidden Markov Models and Protein Structure (20 pts) One biological application of hidden Markov models is to determine the secondary structure...

answer the third part of homework:Risk Budgeting. I.\tMatrix\tAlgebra\tand\tPortfolio\tMath\t(44\tpoints,\t4\tpoints\teach) Let Ri denote the simple return on asset i (i = 1, , N) with E[Ri] = i,...

only need the answer to 3a. for 90% scenario. The solution for problem 2 will be needed is attached. Using the same information regarding Mr. Arnold Benedict considering buying anapartment complex...

You can ignore question 3. I only need help with the first 2 questions. If you could just provide the right formulas to use, I can fill in the answers. UNIVERSITY OF MEMPHIS FOGELMAN COLLEGE OF...

A round hole of radius a is drilled through the center of a solid sphere of radius b (assume that b > a). Find the volume of the solid that remains.

Given the representation: can you determine the g (), if the MT (XT) is the payoff of a plain vanilla European call option at expiration? That is, if MT(XT) is given by: MT (XT) = max [XT K, 0]...

The performance of a bank that continually cencentrates on shert - term deposits in curos and adjustable - rate dollar lows with equal rate sensitivity is unaffected if euro interest rates increase...

I need help with task 1, can anyone help me solve it? Task 1: Use PCA from previous assignment to convert data from 4D to 2D. If you did not manage to implement PCA, you can simply use first 2...