Question: In typical gradient descent, we take steps using a constant step size , so that: t+1=tf(t). In the following, assume that f is an arbitrary

 In typical gradient descent, we take steps using a constant step

In typical gradient descent, we take steps using a constant step size , so that: t+1=tf(t). In the following, assume that f is an arbitrary differentiable function. Grady would like to pick a perfect step size on every step and proposes a new update rule that selects to be the value of step-size that decreases the objective as much as possible in the direction f() and then uses as the step size: =argminf(tf(t))t+1=tf(t) For Grady's rule, what will generally be true? (a) f(t)f(t+1) (b) f(t)f(t+1) (c) cannot say

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!