Question: You are given the following DAXPY loop, which computes the operation Y = aX + YY = aX + YY = aX + Y for
You are given the following DAXPY loop, which computes the operation YaXYY aX YYaXY for a vector of length where aaa is a scalar, and X and Y are vectors. The loop is implemented with the following assembly code:
scss
Copy code
DADDIU R R # ; R upper bound for X LD FR ; F Xi MUL.D F F F ; F a Xi LD FR ; F Yi ADD.D F F F ; F a Xi Yi SD R F ; Store Yi DADDIU R R # ; Increment X index DADDIU R R # ; Increment Y index DSLTU R R R ; Test: continue loop? BNEZ R foo ; Loop if needed
Assume the following:
The functional unit latencies are given as:
FP multiply: cyclesFP add: cyclesFP store: cyclesInteger operations and loads: cycles
Results are fully bypassed.
The branch has a cycle delay and resolves in the ID stage.
Tasks:
Unroll the loop as many times as necessary to schedule it without stalls, collapsing the loop overhead instructions.
Provide the instruction schedule.
Determine the execution time per element of the result.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
