Question: In typical gradient descent, we take steps using a constant step size , so that: t+1=tf(t). In the following, assume that f is an arbitrary

In typical gradient descent, we take steps using a constant step size , so that: t+1=tf(t). In the following, assume that f is an arbitrary differentiable function. Grady would like to pick a perfect step size on every step and proposes a new update rule that selects to be the value of step-size that decreases the objective as much as possible in the direction f() and then uses as the step size: =argminf(tf(t))t+1=tf(t) For Grady's rule, what will generally be true? (a) f(t)f(t+1) (b) f(t)f(t+1) (c) cannot say
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
