Question: Consider a 2-dimensional weight space, and two different error functions: E([w,w]) = w + 4 w - 97 w + 13 w E([w,w]) = 1000

Consider a 2-dimensional weight space, and two different error

functions:

E([w,w]) = w + 4 w - 97 w + 13 w

E([w,w]) = 1000 w + 10 w + 7 w - 3 w

If you optimize each of these using batch gradient descent, with the

learning rate set as high as you can without the system oscillating,

what is the highest learning rate you can use for each of these? What is

each of their rates of convergence? Which is faster? Please show the

relevant formula and your calculations, and draw a diagram.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!