Question: (6 points) We train a neural network on a set of data D, where each training sample di {, t) and input vector is
(6 points) We train a neural network on a set of data D, where each training sample di {, t) and input vector is In, n = - 1... N together with a corresponding set of target vectors tn. We assume that t has Gaussian distribution. We would like to find parameter w for hypothesis h using the maximum likelihood estimation (ML): hML d(tanh(z)) dz argmax p(D/h) hell hML 772 argmaxp(d, h) hell i=1 Show that solving (1) is equivalent to minimizing the sum of squares error function: (1) M72 = arg min (di - h(ri))" i=1 (4 points) Derive the Error Gradient for a linear unit with a tanh activation func- tion. (Hint: =(1tanh(r)?))
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
