Question: 4580 A data analytics problem is solved on a single node development setup with a data set D. The program has been parallelized to the
A data analytics problem is solved on a single node development setup with a data set D. The program has been parallelized to the extent of 83% using various data partitioning and distributed processing techniques. Please answer the following questions in this context. [Marks: 5) (a) Now this program needs to be moved to a staging cluster with 100 nodes with the same dataset. What is the theoretical speed-up possible? (b) What is the theoretical speed-up possible if we could add any number of nodes / processing elements to the system but leave the dataset unchanged? (e) Given the success of the project, we now wish to solve a larger problem using more data and possibly a more accurate but more computationally intensive algorithm. However, we can't parallelize any further beyond 90%. What is the theoretical speedup we can achieve on a 100 node system? Provide your answers rounded to the first decimal place
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
