Question: From the link below: https://www.chegg.com/homework-help/questions-and-answers/431-exercise-compare-performance-1-issue-2-issue-processors-taking-account-program-transfo-q25269311 I pasted the answer here in italics for convenience. Code with stall introduced: MOV X5, XZR B ENT TOP: LSL
From the link below:
https://www.chegg.com/homework-help/questions-and-answers/431-exercise-compare-performance-1-issue-2-issue-processors-taking-account-program-transfo-q25269311
I pasted the answer here in italics for convenience.
Code with stall introduced:
MOV X5, XZR
B ENT
TOP:
LSL X10, X5, #3
ADD X11, X1, X10
LDUR X12, [X11, #0]
LDUR X11, [X11, #8]
SUB X14, X12, X13
ADD X15, X2, X10
STUR X14, [X15, #0]
ADDI X5, X5, #2
ENT: CMP X5, X6
B.NE TOP
On a one issue machine with the stalls in the above requires 12 cycles/loop.
For a 2-issue machine it requires 11 cycles/loop.
=> Speed up = 12/11 = 1.09 which is a good improvement
As the first instruction is fetched in cycle5 in loop 1 for the instruction LSL and in first intruction in iteration2 cycle2 is fetched in cycle16 from this we know that the code takes 11 cycles/iteration on the 2-issue machine.
Why is there a second stall in the answer? Doesn't forwarding alleviate the need for a stall between the CMP and Branch instruction?
Thank you.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
