Question: For the following problem, assume a 5-stage pipelined processor with forwarding and hardware interlocking. Also, assume branch resolution is in the Execute stage. Consider the
For the following problem, assume a 5-stage pipelined processor with forwarding and hardware interlocking. Also, assume branch resolution is in the Execute stage.
Consider the code below:
addi $t2, $t1, 60
loop:
lw $t4, 0($t1)
lw $t5, 4($t1)
xor $t6, $t4, $t5
sw $t6, 8($t1)
addi $t1, $t1, 12
bne $t1, $t2, loop
(a) How many loop iterations does the above code execute?
(b) Identify the data dependencies in the above code.
(c) Draw the pipeline execution diagram for the first two iterations of the above code when an assume not taken branching scheme without a branch delay slot is used.
(d) How many clock cycles are required to execute the above code to completion when an assume not taken branching scheme without a branch delay slot is used?
(e) Modify the code to take advantage of a branch delay slot. How many clock cycles are required to executed your modified code to completion when an assume not taken branching scheme with a branch delay slot is used?
(f) How many clock cycles are required to execute you modified code assuming a 100% correct branch predictor in the decode stage in addition to the branch delay slot.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
