Question: Ex 5 . 5 : Function optimization. Consider the function f ( x , y ) = x 2 + 2 0 y 2 shown

Ex 5.5: Function optimization. Consider the function f(x,y)=x2+20y2 shown in Fig-
ure 5.63a. Begin by solving for the following:
Calculate gradf, i.e., the gradient of f.
Evaluate the gradient at x=-20,y=5.
Implement some of the common gradient descent optimizers, which should take you from
the starting point x=-20,y=5 to near the minimum at x=0,y=0. Try each of the
following optimizers:
Standard gradient descent.
Gradient descent with momentum, starting with the momentum term as =0.99.
Adam, starting with decay rates of 1=0.9 and b2=0.999.
Play around with the learning rate . For each experiment, plot how x and y change over
time, as shown in Figure 5.63b.
How do the optimizers behave differently? Is there a single learning rate that makes all
the optimizers converge towards x=0,y=0 in under 200 steps? Does each optimizer
monotonically trend towards x=0,y=0?Figure 5.63 Function optimization: (a) the contour plot of f(x,y)=x2+20y2 with
the function being minimized at (0,0);(b) ideal gradient descent optimization that quickly
converges towards the minimum at x=0,y=0.
Would batch normalization help in this case?
Note: the following exercises were suggested by Matt Deitke.
Ex 5.5: Function optimization. Consider the function f(x,y)=x2+20y2 shown in Fig-
ure 5.63a. Begin by solving for the following:
Calculate gradf, i.e., the gradient of f.
Evaluate the gradient at x=-20,y=5.
Implement some of the common gradient descent optimizers, which should take you from
the starting point x=-20,y=5 to near the minimum at x=0,y=0. Try each of the
following optimizers:
Standard gradient descent.
Gradient descent with momentum, starting with the momentum term as =0.99.
Adam, starting with decay rates of 1=0.9 and b2=0.999.
Play around with the learning rate . For each experiment, plot how x and y change over
time, as shown in Figure 5.63b.
How do the optimizers behave differently? Is there a single learning rate that makes all
the optimizers converge towards x=0,y=0 in under 200 steps? Does each optimizer
monotonically trend towards x=0,y=0?
 Ex 5.5: Function optimization. Consider the function f(x,y)=x2+20y2 shown in Fig-

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!