Consider the following two versions of a program to add two vectors: a. The program on the

Question:

Consider the following two versions of a program to add two vectors:
Consider the following two versions of a program to add

a. The program on the left executes on a uniprocessor. Suppose each line of code L2, L4, and L6 takes one processor clock cycle to execute. For simplicity, ignore the time required for the other lines of code. Initially all arrays are already loaded in main memory and the short program fragment is in the instruction cache. How many clock cycles are required to execute this program?
b. The program on the right is written to execute on a multiprocessor with M processors. We partition the looping operations into M sections with elements per section. DOALL declares that all M sections are executed in parallel. The result of this program is to produce M partial sums. Assume that k clock cycles are needed for each inter processor communication operation via the shared memory and that therefore the addition of each partial sum requires k cycles. An l-level binary adder tree can merge all the partial sums, where How many cycles are needed to produce the final sum?
c. Suppose elements in the array and What is the speedup achieved by using the multiprocessor? Assume What percentage is this of the theoretical speedup of a factor of 256?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Question Posted: