Question: Consider the problem of binary classification where x = (x1, ..., xd)T Rd, y {0, 1} and w = (w1, ...,wd)T Rd are the input

Consider the problem of binary classification where x = (x1, ..., xd)T Rd, y {0, 1} and w = (w1, ...,wd)T Rd are the input feature vector, the outcome target and the weight vector, respectively. The binary regression model is parameterized as follows: y Bernoulli(y) with y = p(y = 1|x,w) = (z) = 1 1+ez where z = wTx = Xd j=1 wjxj . (a) For a single example (x(i), y(i)), the loss is defined as the negative log likelihood or cross-entropy loss: L(i) CE(w) = log p(y(i)|x(i),w). Recall that, p(y(i)|x(i),w) = (z(i))y(i)(1 (z(i)))1y(i) , y(i) = 0, 1. Show that L(i) CE(w) = y(i) log (z(i)) (1 y(i)) log 1 (z(i)) (b) Using the chain rule show that wj L(i) CE(w) = ((z(i)) y(i))x(i) j , j = 1...d. (c) Let D = {(x(i), y(i))}mi =1 be a training set of m identically independently distributed examples. Let the cross-entropy loss corresponding to D be LDC E(w) = Xm i=1 L(i) CE(w). Derive the expression for wj L(D) CE(w), j = 1...d, and hence provide both the stochastic and batch gradient descent update rules. (d) Provide an interpretation of the results found in (c).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Chemical Engineering Questions!