Question: ( 3 0 points ) Consider the following instruction sequence running on the five - stage pipelined processor: beq x 1 1 , x 1

(30 points) Consider the following instruction sequence running on the five-stage pipelined
processor:
beq x11, x12, Label
sd x15,0(23)
ld x15,0(x24)
add 11,6,12
sub x11,x13,x12
Assume x11x12.
Note that in the following questions, structural hazards are considered only in (a) and (b).
(a)(10 points) Assume the processor predicts each branch instruction to be not taken. If we only
have one memory (for both instructions and data), there is a structural hazard every time we
need to fetch an instruction in the same cycle in which another instruction accesses data. To
guarantee the processor to work correctly, this structural hazard must always be resolved in
favor of the instruction that accesses data. In other words, there is a hazard detection unit in
the IF stage, and if a structural hazard occurs, the instruction in the IF stage needs to stall for
that cycle. What is the total execution time of this instruction sequence? We have learned
that data hazards can be eliminated by adding NOPs to the code, so can you do the same with
this structural hazard and why? (b)(5 points) Assume we use the same processor in (a). What is the minimum number of cycles
you can achieve by adjusting the order of the instructions without losing the correctness?
Also give the new sequence of instructions after re-ordering.
(c)(5 points) Assuming Stall on Branch (i.e., wait until the branch outcome is determined before
fetching next instruction), what speedup is achieved on this instruction sequence if branch
outcomes are determined in the ID stage, relative to the execution where branch outcomes are
determined in the MEM stage?
(d)(5 points) Assume the processor predicts each branch instruction to be not taken. Also assume
each individual pipeline stage of IF, ID, EX, MEM, and WB has the latency of 210ps,160
ps,220ps,180ps, and 100ps, respectively. If we change load/store instructions to use a
register (without an offset) as the address, these instructions no longer need to use the ALU.
As a result, MEM and EX stages can be overlapped and the pipeline has only 4 stages.
Assuming this change does not affect the clock period, what speedup is achieved in this
instruction sequence compared to the original five-stage one?
(e)(5 points) Given the pipeline stage latencies in (d), repeat the speedup calculation of (d) by
considering the (possible) change in the clock period as follows. When EX and MEM are
done in a single stage (called EX/MEM stage), most of their work can be done in parallel. As
a result, the EX/MEM stage now has a latency that is the larger of the original two, plus 25ps
needed for the work that could not be done in parallel.
 (30 points) Consider the following instruction sequence running on the five-stage

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!