Question: Question 2. [25 MARKS] We can combine simple distributions to produce more complex (multi-modal) distributions using mixtures. The below figure shows what occurs if

Question 2. [25 MARKS] We can combine simple distributions to produce more complex (multi-modal) distributions using mixtures. The below figure shows what occurs if

Question 2. [25 MARKS] We can combine simple distributions to produce more complex (multi-modal) distributions using mixtures. The below figure shows what occurs if we take a convex combination of two Gaussians. We can write the distribution for such a random variable X with density corresponding to an equal mixture of two Gaussians, with unknown means , 2 and unknown variances 0,0%, as follows. p(x|w) = 0.5(x|1,) + 0.5(x|2, ) (1) 0 Figure 1: The blue curve is a Gaussian with = -1 and = 0.5 and the red curve is a Gaussian with = 1 and = 0.5. The purple curve is the mixture of the two, as in Equation (1). The purple curve allows us to model a bimodal distribution (two peaks), where now the two most likely values are -1 and 1, with density decreasing from these points. If we sample from this distribution, then we will see points centered around -1 and 1, with a reasonable likelihood for a point between the two (including at zero), and very low likelihood for points outside -2 and 2. where we write the four parameters w = (1, 2, 01, 02) and N(x|, o) = (2)1/20 exp((x )/(20)). It is easy to show that p(x/w) is a valid density, because - [p(x|w)dx = [ 0.5N(x|141, 0}) + 0.5(x|2, o)dx = 0.5 / N(x|, o)dz + 0.5 [N(x|2, 0)dx = 0.5 0.5 1. (2) You set forth to learn this distribution p(x|w). However, now when you take the log-likelihood, you find that the log does not help as much, because the sum gets in the way of the log being applied to the exponentials. Inp(x|w) = In (0.5N(x|,) +0.5N(x|2, )) The log still helps convert the product over samples into a sum, for a given dataset of n iid samples from this distribution D = {x}=1 n n In p(D|w) = ln [[p(x|w) = Inp(xi|w) i=1 i=1 Despite this difficulty, you are determined to learn this distribution, because you are confident it will do a better job of modeling your data. Your goal in this question is to obtain a procedure to estimate w = (1, H2, 01, 02). (a) [20 MARKS] Compute the gradient (partial derivatives) of your negative log likelihood objective c(w) Inp(Dw). Start by computing the gradient of Inp(x; w). To simplify notation, consider defining g(j, ;) = ; exp((xi | Mj)/(20)). (b) [5 MARKS] Write the (first-order) gradient descent update rule for your parameters, using the gradient you compute, assuming you start from current point w and have stepsize nt.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Identical bricks 0.1m wide, and weighting 2kg are placed on a plate (assume the plate has neligible weight). Calculate the total weight (and force F required to balance the plate). How far from x=0...

Determine whether A is diagonalizable and, if so, find an invertible matrix P and a diagonal matrix D such that P -1 AP = D. 1 1 1 1 1 1 0 2 A = %3D 0 0 2 1 L0 0 0 1

The two beams shown have the same cross section and are joined by a hinge at C. For the loading shown, determine (a) the slope at point A, (b) the deflection at point B. Use E = 200GPa. 3.5 kN A B C...

Course : Analysis of experience plans MAT3378 Exercise: A factory contains a large number of winding machines coils. An analyst studies a certain characteristic of the coils produced by these...

Based on the time recorded for 500 attendees at an amusement park, the average amount of time spent at the park was 6.25 hou $1.45 hours. If 4,000 people attended the park on a given day, what is the...

Question 1. [25 MARKS] Imagine that you would like to predict if your favorite table will be free at your favorite restau- rant. The only additional piece of information you can collect, however, is...

Hi, I am doing a project for derivatives class and I have some questions about a regression and Monte Carlo simulation that I need to come up with. So the goal is to hedge against CDS instruments by...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

Providing Quality School-Based Learning and Support Services 239 Chapter 6 Language and literacy support Your core task The core task of almost all TAs is to support students language and literacy...

give a summary of this article please!!! comment on its implications for manager rewards and investors returns. please answer I really need for class!!!! thank you!!!! Portfolio Performance...

Shandler's Maid Company offers home cleaning service. Two recurring transactions for the company are billing customers for services rendered and paying employee salaries. For example, on March 15,...

During the current year, merchandise is sold for $2,450,000. The cost of the merchandise sold is $1,519,000. a. What is the amount of the gross profit? b. Compute the gross profit percentage (gross...

2.40 For the ladder network in Fig. 2.104, find I and R eq. 15 V+ 8 w 102 www ww Rea 6

Contrast debt and preferred stock as a source of financial leverage.

Consider a greedy algorithm for the VERTEX-COVER problem, where we repeatedly choose a vertex with maximum degree, add it to our cover, and then remove it and all its incident edges. Show that this...

Workers who survive a layoff of other employees at their location may suffer from survivor guilt. A study of survivor guilt and its effects used as subjects 120 students who were offered an...

For the data in Problem 5-2, compute a 95 percent confidence interval estimate of the mean difference in response for feed rates of 0.20 and 0.25 in/min. Problem 5-2 An engineer suspects that the...

Financial statement information about four different companies is as follows. Instructions (a) Determine the missing amounts. (b) Prepare the owners equity statement for Alpha Company. (c) Write a...

Design a minimum-mass symmetric three-bar truss (the area of member 1 and that of member 3 are the same) to support a load P, as was shown in Fig. 2.9. The following notation may be used: Pu = P cos...

as gets closertothe H0 valueof0.50. (c) Set = 0.60. Report P(TypeIIerror)for n equal to(i)50,(ii)100,(iii)200,and summarize theimpactof n on P(TypeIIerror).

Fortesting H0: = 0 against Ha: > 0 with = 0.05, use Figure

Usethe ErrorsandPower app at www.artofstat.com/web-apps to investigatetheperformance of significancetests.Setthehypothesesas H0: =