Question: ( Distributed SGD ) Assume we re training a model with distributed SGD with m = 6 machines. For n = 7 6 , 8
Distributed SGD Assume were training a model with distributed SGD with m machines. For n data points and minibatch size b on the master node, how many distributed rounds do we need to finish epochs? Assume the time for processing a minibatch by the workers is s and each communication from master to workers or from workers to master takes s and every worker increases the aggregation time at the master by s Hint: Remember to include communication time from master to workers and workers to master!
How long does it take to train on the entire data for epochs?
After having how many workers the communication and aggregation time dominates is strictly greater than the gradient computation time?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
