Question: we will use the following loop: for (k=n; k>=0; k--) x[k] = y[k+1] 2.0 * y[k]; If we assume: R1 contains the address of the

we will use the following loop: for (k=n; k>=0; k--) x[k] = y[k+1] 2.0 * y[k];

If we assume:

R1 contains the address of the nth element of y

R2 contains the address of the nth element of x

F0 contains 2.0

The above code could be written as the following MIPS assembly code:

Loop: L.D F2, 8(R1)

L.D F4, 0(R1)

MULT.D F6, F4, F0

SUB.D F8, F2, F6

S.D F8, 0(R2)

SUBI R1, R1, #8 #same as DADDUI R1, R1, #-8

SUBI R2, R2, #8

BNEZ R2, Loop

Using the following table for instruction latencies:

Instruction/Operation Type

Latency in Clock Cycles

Double Load

1

Double Store

0

Integer Op

0

FP Multiply

4

FP Add/Subtract

2

(1) Show the (cycle) schedule, including stalls of the unmodified loop on a fully pipelined machine. Cycle Instruction/stall

(2) Unroll the loop 3 times (for a total of 3 iterations) but do not reschedule the instructions. Ignore the branch delay slot. Do not delete any instructions other than loop overhead instructions. Assume n is a multiple of 3. Cycle Instruction/stall

(3) Unroll the loop 3 times (for a total of 3 iterations) and reschedule the instructions to reduce the number of stalls. Ignore the branch delay slot. Do not delete any instructions other than loop overhead instructions. Assume n is a multiple of 3. Cycle Instruction/stall

(4) What is the speedup of the unrolled loop in (2) from the unmodified case in (1)? What is the speedup of the unrolled and scheduled loop in (3) from the unmodified case in (1). Please show your calculation.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!