Question: Consider the following ARMv8 assembly code: Loop: LDUR X1, [X2, #0] LDUR X4, [X3, #0] MUL X1, X1, X4 ADD X5, X1, X5 ADD X2,
Consider the following ARMv8 assembly code: Loop: LDUR X1, [X2, #0] LDUR X4, [X3, #0] MUL X1, X1, X4 ADD X5, X1, X5 ADD X2, X2, #8 ADD X3, X3, #8 SUB X8, X7, X2 CBNZ X8, Loop A) Assuming standard data forwarding data from arithmetic instructions. Remember we cannot forward data from LDUR to next arithmetic instruction immediately. Also, we need two stalls after a conditional branch to determine if the loop will be repeated or not. Show how the instruction flows in the standard 5 stage pipeline, IF (instruction fetch), ID (instruction decode), EX (execute), MEM (data memory access), WB (write back results). How many cycles are needed to complete one iteration? B) Can you reorder the code to minimize the number of cycles needed? How many cycles are needed to execute one iteration after reordering? ---- Please neatly answer a and b. Include all steps and as much detail as possible, I will mark as helpful.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
