Question: 4 . 2 ( 1 0 points ) Derive Gradient Given a training dataset Straining = { ( xi , yi ) } , i

4.2(10 points) Derive Gradient
Given a training dataset Straining ={(xi, yi)}, i =1,..., n}, we wish to optimize the
negative log-likelihood loss L(w, b) of the logistic regression model defined above:
n
L(w,b)=Xlnpi (5)
i=1
where pi = p(yi|xi). The optimal weight vector w and bias b are used to build the
logistic regression model:
w,b=argminL(w,b)(6) w,b
In this problem, we attempt to obtain the optimal parameters w and b by using a standard gradient descent algorithm.
(a) Please show that
L ( w , b )w
=
Xn i=1
(1 pi)yixi.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!