Question: (6 points) We train a neural network on a set of data D, where each training sample di {, t) and input vector is

 (6 points) We train a neural network on a set of data 

(6 points) We train a neural network on a set of data D, where each training sample di {, t) and input vector is In, n = - 1... N together with a corresponding set of target vectors tn. We assume that t has Gaussian distribution. We would like to find parameter w for hypothesis h using the maximum likelihood estimation (ML): hML d(tanh(z)) dz argmax p(D/h) hell hML 772 argmaxp(d, h) hell i=1 Show that solving (1) is equivalent to minimizing the sum of squares error function: (1) M72 = arg min (di - h(ri))" i=1 (4 points) Derive the Error Gradient for a linear unit with a tanh activation func- tion. (Hint: =(1tanh(r)?))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!