Question: Consider an error function E(w)=0.05+(w-3)2/2 . Different variants of gradient descent algorithm can be used to minimize this error function w.r.t. w. Assume at time
Consider an error function E(w)=0.05+(w-3)2/2 . Different variants of gradient descent algorithm can be used to minimize this error function w.r.t. w. Assume at time (t-1) w=1 and w=1.5 after update at time t. Assume, learning rate and momentum update rate . What is the value of w, at time (t+1) if Nestorov based gradient descent is used?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
