Question: (40 points) Softmax classifier gradient. For softmax classifier, derive the gradient of the log likelihood. Concretely, assume a classification problem with c classes Samples are

(40 points) Softmax classifier gradient. For softmax classifier, derive the gradient of the log likelihood. Concretely, assume a classification problem with c classes Samples are (x1), y(1)), ..., (x(m), y(m)), where x) ER", y0) {1,...,c}, j = 1, ..., m , (() Parameters are 0 x - {Wi, bi}i=1,..., i= Probablistic model is Pr (76) = i|x6), ) = softmax;(x)) = = where softmaxi(x) ewtxtb; E-lewx+bx X Derive the log-likelihood L, and its gradient w.r.t. the parameters, Vw;L and Vb;L, for i = 1, ...,C. Note: We can group wi and b into a single vector by augmenting the data vectors with an additional dimension of constant 1. Let x = then as(x) = wx+b = {x. bi This unifies Vw;L and Vb;L into Vw;L. X Wi = Wi= = W = ai T 2 7 7
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
