Question: 2 - Consider the following code, which represents the operation Y = a x + Y for a vector length of 1 0 0 .

2- Consider the following code, which represents the operation Y=ax+Y for a vector length of 100. Assume
the pipeline latencies shown below and a 1-cycle delay branch that is resolved in the ID stage. In addition, the
pipeline uses branch forwarding that forwards the result of an ALU operation from the MEM stage to the ID
stage (i.e., MEM-to-ID forwarding).
Latencies of FP operations
(a) Show how this loop would execute without any scheduling. Maximize the performance of this code by
applying both instruction reordering (also known as pipeline scheduling) and delay branch techniques.
Ignoring the startup delays and assuming the loop executes 100 times, determine the number of cycles
required to execute the code before and after the optimizations. Do not be concerned about what happens
after the loop.
(b) Unroll the loop once (i.e., make two copies) to schedule it without stalls and show the instruction schedule.
Again, assuming the loop executes 100 times, determine the number of cycles required to execute the code
before and after unrolling.
 2- Consider the following code, which represents the operation Y=ax+Y for

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!