Question: ( Distributed SGD ) Assume we re training a model with distributed SGD with m = 6 machines. For n = 7 6 , 8

(Distributed SGD) Assume were training a model with distributed SGD with m =6 machines. For n =76,800 data points and mini-batch size b =1,536 on the master node, how many distributed rounds do we need to finish 10 epochs? Assume the time for processing a mini-batch by the workers is 4s and each communication (from master to workers or from workers to master) takes 1.5s, and every worker increases the aggregation time at the master by 0.015s. Hint: Remember to include communication time from master to workers and workers to master!
(1) How long does it take to train on the entire data for 10 epochs?
(2) After having how many workers the communication and aggregation time dominates (is strictly greater than) the gradient computation time?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!