Question: In Stochastic Gradient Descent ( SGD ) with batch size and momentum , the step taken at time is: where is the batch gradient and
In Stochastic Gradient Descent SGD with batch size and momentum the step taken at time is:
where is the batch gradient and is the gradient of the th datapoint. Let's now expand the recursion of momentum
If we assume that the gradients do not change significantly from one SGD step to the next, we can further simplify this expression to
One final simplification is to remove the rounding due to the batching
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
