Question: You have implemented a simple SGD optimizer in assignment-1. In prac- tice, it is common to use a momentum term in SGD for better convergence.
You have implemented a simple SGD optimizer in assignment-1. In prac- tice, it is common to use a momentum term in SGD for better convergence. Specifically, we introduce a new velocity term vt and the update rule is as follows: vt = vt1 L w w = w + vt where denotes the momentum coefficient and denotes the learning rat
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
