Question: What is the speedup of using your code from 4.29.4 instead of the original code with a 2-issue static superscalar processor? Assume that the loop
What is the speedup of using your code from 4.29.4 instead of the original code with a 2-issue static superscalar processor? Assume that the loop has many (e.g., 1,000,000) iterations.
Exercise 4.29.4
Unroll this loop once and schedule it for a 2-issue static superscalar processor. Assume that the loop always executes an even number of iterations. You can use registers R10 through R20 when changing the code to eliminate dependences.
In this exercise, we consider the execution of a loop in a statically scheduled superscalar processor. To simplify the exercise, assume that any combination of instruction types can execute in the same cycle, e.g., in a 3-issue superscalar, the three instructions can be 3 ALU operations, 3 branches, 3 load/store instructions, or any combination of these instructions. Note that this only removes a resource constraint, but data and control dependences must still be handled correctly. Problems in this exercise refer to the following loop:
a. b. Loop: Loop: ADDI R1, R1,4 LW R2,0 (R1) LW R3,16(R1) ADD R2, R2, R1 ADD R2, R2, R3 BEQ R2, zero, Loop LW R1,0 (R1) AND R1, R1, R2 LW R2,0 (R2) BEQ R1,zero, Loop Loop
Step by Step Solution
3.34 Rating (169 Votes )
There are 3 Steps involved in it
Lets assume the following The original loop without optimization takes T O To cycles to complete The ... View full answer
Get step-by-step solutions from verified subject matter experts
