Question: 2 - Consider the following code, which represents the operation Y = a x + Y for a vector length of 1 0 0 .
Consider the following code, which represents the operation for a vector length of Assume
the pipeline latencies shown below and a cycle delay branch that is resolved in the ID stage. In addition, the
pipeline uses branch forwarding that forwards the result of an ALU operation from the MEM stage to the ID
stage ie MEMtoID forwarding
Latencies of FP operations
a Show how this loop would execute without any scheduling. Maximize the performance of this code by
applying both instruction reordering also known as pipeline scheduling and delay branch techniques.
Ignoring the startup delays and assuming the loop executes times, determine the number of cycles
required to execute the code before and after the optimizations. Do not be concerned about what happens
after the loop.
b Unroll the loop once ie make two copies to schedule it without stalls and show the instruction schedule.
Again, assuming the loop executes times, determine the number of cycles required to execute the code
before and after unrolling.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
