Question: [ 1 0 points ] We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss

[10 points] We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss function which we denote by (z)=max(0,-z). Let S be a training set (x1,y1),dots,(xm,ym) where each xiinRn and yiin{-1,1}. Consider running stochastic gradient descent (SGD) to find a weight vector w that minimizes 1mi=1m(yi*wTxi). Explain the explicit relationship between this algorithm and the Perceptron algorithm. Recall that for SGD, the update rule when the ith example is picked at random is
wnew=wold-grad(yiwTxi)
Note: You do not need to be overly concerned about the discontinuity at (0), so you can ignore this when calculating the gradient for this problem.
[ 1 0 points ] We have mainly focused on squared

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!