Question: a. (10) Which operation(s) in the loop can NOT be parallelized? Hint: these will be the operation(s) that depend on the result of that operation
a. (10) Which operation(s) in the loop can NOT be parallelized? Hint: these will be the operation(s) that depend on the result of that operation from the previous loop iteration. Write your answers in your solutions document.
b. (10) Given your answer from part a, what is the best-case CPE for the loop as currently written? Assume that float addition has a latency of 3 cycles, float multiplication has a latency of 5 cycles, and all integer operations have a latency of 1 cycle. Hint: the best-case CPE will be latency of the slowest of the operation(s) you identified in part a. Write your answers in your solutions document.



2. [40] Suppose we've got a procedure that computes the inner product of two arrays u and v. Consider the following C code: void inner (float *u, float *v, int length, float *dest) { int i; float sum = 0.0f; for (i = 0; i
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
