Question 2. [25 MARKS] We can combine simple distributions to produce more complex (multi-modal) distributions using...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Question 2. [25 MARKS] We can combine simple distributions to produce more complex (multi-modal) distributions using mixtures. The below figure shows what occurs if we take a convex combination of two Gaussians. We can write the distribution for such a random variable X with density corresponding to an equal mixture of two Gaussians, with unknown means , 2 and unknown variances 0,0%, as follows. p(x|w) = 0.5(x|1,) + 0.5(x|2, ) (1) 0 Figure 1: The blue curve is a Gaussian with = -1 and = 0.5 and the red curve is a Gaussian with = 1 and = 0.5. The purple curve is the mixture of the two, as in Equation (1). The purple curve allows us to model a bimodal distribution (two peaks), where now the two most likely values are -1 and 1, with density decreasing from these points. If we sample from this distribution, then we will see points centered around -1 and 1, with a reasonable likelihood for a point between the two (including at zero), and very low likelihood for points outside -2 and 2. where we write the four parameters w = (1, 2, 01, 02) and N(x|, o) = (2)1/20 exp((x )/(20)). It is easy to show that p(x/w) is a valid density, because - [p(x|w)dx = [ 0.5N(x|141, 0}) + 0.5(x|2, o)dx = 0.5 / N(x|, o)dz + 0.5 [N(x|2, 0)dx = 0.5 0.5 1. (2) You set forth to learn this distribution p(x|w). However, now when you take the log-likelihood, you find that the log does not help as much, because the sum gets in the way of the log being applied to the exponentials. Inp(x|w) = In (0.5N(x|,) +0.5N(x|2, )) The log still helps convert the product over samples into a sum, for a given dataset of n iid samples from this distribution D = {x}=1 n n In p(D|w) = ln [[p(x|w) = Inp(xi|w) i=1 i=1 Despite this difficulty, you are determined to learn this distribution, because you are confident it will do a better job of modeling your data. Your goal in this question is to obtain a procedure to estimate w = (1, H2, 01, 02). (a) [20 MARKS] Compute the gradient (partial derivatives) of your negative log likelihood objective c(w) Inp(Dw). Start by computing the gradient of Inp(x; w). To simplify notation, consider defining g(j, ;) = ; exp((xi | Mj)/(20)). (b) [5 MARKS] Write the (first-order) gradient descent update rule for your parameters, using the gradient you compute, assuming you start from current point w and have stepsize nt. Question 2. [25 MARKS] We can combine simple distributions to produce more complex (multi-modal) distributions using mixtures. The below figure shows what occurs if we take a convex combination of two Gaussians. We can write the distribution for such a random variable X with density corresponding to an equal mixture of two Gaussians, with unknown means , 2 and unknown variances 0,0%, as follows. p(x|w) = 0.5(x|1,) + 0.5(x|2, ) (1) 0 Figure 1: The blue curve is a Gaussian with = -1 and = 0.5 and the red curve is a Gaussian with = 1 and = 0.5. The purple curve is the mixture of the two, as in Equation (1). The purple curve allows us to model a bimodal distribution (two peaks), where now the two most likely values are -1 and 1, with density decreasing from these points. If we sample from this distribution, then we will see points centered around -1 and 1, with a reasonable likelihood for a point between the two (including at zero), and very low likelihood for points outside -2 and 2. where we write the four parameters w = (1, 2, 01, 02) and N(x|, o) = (2)1/20 exp((x )/(20)). It is easy to show that p(x/w) is a valid density, because - [p(x|w)dx = [ 0.5N(x|141, 0}) + 0.5(x|2, o)dx = 0.5 / N(x|, o)dz + 0.5 [N(x|2, 0)dx = 0.5 0.5 1. (2) You set forth to learn this distribution p(x|w). However, now when you take the log-likelihood, you find that the log does not help as much, because the sum gets in the way of the log being applied to the exponentials. Inp(x|w) = In (0.5N(x|,) +0.5N(x|2, )) The log still helps convert the product over samples into a sum, for a given dataset of n iid samples from this distribution D = {x}=1 n n In p(D|w) = ln [[p(x|w) = Inp(xi|w) i=1 i=1 Despite this difficulty, you are determined to learn this distribution, because you are confident it will do a better job of modeling your data. Your goal in this question is to obtain a procedure to estimate w = (1, H2, 01, 02). (a) [20 MARKS] Compute the gradient (partial derivatives) of your negative log likelihood objective c(w) Inp(Dw). Start by computing the gradient of Inp(x; w). To simplify notation, consider defining g(j, ;) = ; exp((xi | Mj)/(20)). (b) [5 MARKS] Write the (first-order) gradient descent update rule for your parameters, using the gradient you compute, assuming you start from current point w and have stepsize nt.
Expert Answer:
Posted Date:
Students also viewed these mathematics questions
-
Tort cases are so common that it is likely you or someone you know has been involved in a tort case. If so, share what the case was about, what the outcome was, and how you felt about the case and...
-
Define prevention costs and give two examples of prevention cost activities.
-
The Florida Company was flooded by a Hurricane and somehow lost part of their forecasting data. Positions in the table that are marked a b c d e f must be recalculated from the remaining data....
-
Evaluate the following statement: "A firm should always attempt to maximize production and sale of the product with the highest contribution margin."
-
Sunburn Sunscreen has a zero coupon bond issue outstanding with a $25,000 face value that matures in one year. The current market value of the firms assets is $26,300. The standard deviation of the...
-
Question: Write A Static Method That Returns A New ArrayListThat Contains The Lengths Of The Strings In Thethat contains the lengths of the Strings in the given ArrayList Fo"...
-
An entity installed a new production facility and incurred a number of expenses at the point of installation. The entity's accountant is arguing that most expenses do not qualify for capitalization....
-
What is the capitalized rate of return and how is it selected?
-
Briefly explain how a business value can be estimated using the market-based method.
-
Allison began her health care career with an established general dentistry practice consisting of one dentist, Dr. Gable. Shortly after she started her job, Dr. Gable decided to add a second dentist,...
-
What's the forecasting assumption used in assigning weights to the past period to product earnings?
-
What is the pro forma statement, and how important is it for a business?
-
On the last day of your internship, your boss organised a farewell party for you. At the party, you met a Financial Analyst, a Relationship Manager and a FX Trader. The conversations circled around...
-
The executor of Gina Purcells estate has recorded the following information: Assets discovered at death (at fair value): Cash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
Study smarter with the SolutionInn App