Question: You run gradient descent for 1 5 iterations with = 0 . 3 and compute w decrease after each iteration. You find that the value
You run gradient descent for iterations with and compute w decrease after each iteration. You find that the value of w decrease slowly and is still decreasing after iterations. Based on this, which of the following conclusions seems most possible?Rather than the current value of itd be more promising to try a smaller value of say Rather than use the current value of itd be more promising to try a large value of say is an effective choice of learning rate
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
