Question: You run gradient descent for 1 5 iterations with = 0 . 3 and compute w decrease after each iteration. You find that the value

You run gradient descent for 15 iterations with =0.3 and compute w decrease after each iteration. You find that the value of w decrease slowly and is still decreasing after 15 iterations. Based on this, which of the following conclusions seems most possible?Rather than the current value of , itd be more promising to try a smaller value of (say =0.1)Rather than use the current value of , itd be more promising to try a large value of (say =1.0)=0.3 is an effective choice of learning rate

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!