Question: Consider an error function E(w)=0.05+(w-3)2/2 . Different variants of gradient descent algorithm can be used to minimize this error function w.r.t. w. Assume at time

Consider an error function E(w)=0.05+(w-3)2/2 . Different variants of gradient descent algorithm can be used to minimize this error function w.r.t. w. Assume at time (t-1) w=1 and w=1.5 after update at time t. Assume, learning rate and momentum update rate . What is the value of w, at time (t+1) if Nestorov based gradient descent is used?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!