Question: In Stochastic Gradient Descent ( SGD ) with batch size and momentum , the step taken at time is: where is the batch gradient and

In Stochastic Gradient Descent (SGD) with batch size and momentum , the step taken at time is:
where is the batch gradient and is the gradient of the -th data-point. Let's now expand the recursion of momentum
If we assume that the gradients do not change significantly from one SGD step to the next, we can further simplify this expression to
One final simplification is to remove the rounding due to the batching

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!