Question: Q 2 . Gradient Descent ( 2 pt ) Given N training data points { ( x k , y k ) } , k

Q2. Gradient Descent (2pt)
Given N training data points {(xk,yk)},k=1,2,dots,N,xk in Rd, and labels as yk in {-1,1}(either -1 or
, we seek a linear discriminant function f(xk)=w*xk=j=1dwjxk,j(where xk,j is the feature value
of attribute j of a data point xk) optimizing a special loss function L(z)=e-z, where z=yf(x).
Let >0 be the learning rate, please derive the gradient update wk for a randomly selected data point
k in the stochastic gradient descent (SGD) method.
Hint: Note that SGD randomly pick one data sample k for gradient update per iteration. We can write
zk=ykf(xk)=yk(j=1dwjxk,j) where xk,j is the feature value of attribute j of a data point xk.
You need to first write wj(partial derivative with respect to attribute j) and then get wk(the vector
consisting of partial derivatives with respect to the d attributes).
Q 2 . Gradient Descent ( 2 pt ) Given N training

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!