Question: The following C code generates a double precision floating point Fibonacci sequence: D [ 0 ] = 1 . 0 ; D [ 1 ]

The following C code generates a double precision floating point Fibonacci sequence: D[0]=1.0;
D[1]=1.0;
for (j =2; j <10002; j++)
D[j]= D[j 1]+ D[j 2];
The MIPS code corresponding to the above fragment is:
li.d $f0,1.0
s.d $f0,0($a0)
s.d $f0,8($a0)
li $s0,80016
add $s1, $a0, $s0
addi $s2, $a0,16
loop: l.d $f0,-16($s2)
l.d $f2,-8($s2)
add.d $f4, $f0, $f2
s.d $f4,0($s2)
addi $s2, $s2,8
bne $s2, $s1, loop
Instructions taking more than 1 cycle have the following associated added latencies (in cycles): add.d:2, li.d:1(ALU delays/hazards), l.d:4(data read buffer delay/hazard), s.d:3(data write buffer delay/hazard) for double precision processing.
a) How many cycles does it take to execute this code if every hazard stalls the next instruction?
b) Reorder the code to reduce stalls. Now, how many cycles does it take to execute this code?
c) When an instruction in a later iteration of a loop depends upon a data value produced in an earlier iteration of the same loop, we say that there is a loop carried dependence between iterations of the loop. Identify the loop-carried dependences in the above code. Identify the dependent program variable and assembly-level registers. You can ignore the loop induction variable j.
d) Rewrite the code by using registers to carry the data between iterations of the loop (as opposed to storing and reloading the data from main memory). Show where this code stalls and calculate the number of cycles required to execute. Note that for this problem you will need to use the assembler pseudo instruction "move.d rd, rs", which writes the value of floating-point register rs1 into floating-point register rd. Assume that mov, d executes in a single cycle.
e) Unroll and optimize the loop above so that each unrolled loop handles three iterations of the original loop. Show where this code stalls and calculate the number of cycles required to execute.
f) Unrolling from works nicely because we happen to want a multiple of four iterations. What happens if the number of iterations is not known at compile time? How can we efficiently handle a number of iterations that is not a multiple of the number of iterations per unrolled loop?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!