What is the speedup of using your code from 4.29.4 instead of the original code with a

Question:

What is the speedup of using your code from 4.29.4 instead of the original code with a pipelined (1-issue) processor? Assume that the loop has many (e.g., 1,000,000) iterations.

Exercise 4.29.4

Unroll this loop once and schedule it for a 2-issue static
superscalar processor. Assume that the loop always executes an even number of iterations. You can use registers R10 through R20 when changing the code to eliminate dependences.

In this exercise, we consider the execution of a loop in a statically scheduled superscalar processor. To simplify the exercise, assume that any combination of instruction types can execute in the same cycle, e.g., in a 3-issue superscalar, the three instructions can be 3 ALU operations, 3 branches, 3 load/store instructions, or any combination of these instructions. Note that this only removes a resource constraint, but data and control dependences must still be handled correctly. Problems in this exercise refer to the following loop: a. b. Loop: Loop: ADDI R1, R1,4 LW R2,0 (R1) LW R3,16(R1) ADD R2, R2, R1 ADD R2, R2, R3 BEQ R2, zero, Loop LW