Question: Consider a MIPS 5-stage pipeline with the execution stage consisting of 1-Integer, 1-FP multiply, 1-FP adder and 1-FP divide shown below. Integer operations takes 1

 Consider a MIPS 5-stage pipeline with the execution stage consisting of

Consider a MIPS 5-stage pipeline with the execution stage consisting of 1-Integer, 1-FP multiply, 1-FP adder and 1-FP divide shown below. Integer operations takes 1 clock cycle, FP/integer multiply takes 7 clock cycles FP adder takes 4 clock cycles and FP divide takes 25 clock cycles. FP load is similar to an integer load operation. FO, 0 (R2) F4, 0 (R2) FO, FO, F4 F2, FO, F2 R2, R2, #8 R3, R3, #8 R5, R4, R2 R5, Loop Loop: L.D EX integer L.D MULT. ADD.D ADDI ADDI SUB EX Multiply) Assume that the initial value of R4 is R2+792 (a) Show the timing of this instruction sequence for the 5- IF ID MEM WB stage MIPS pipeline with forwarding. Assume that the branches are handled by flushing the pipeline. If all memory references hit in the cache, how many cycles does this loop take? (Hint: Make use of Excel spreadsheet where on one column show all the instructions and on multiple rows show the timing) EX FPA EX FPA Divider) (b) Assuming the pipeline with a single cycle delayed branch slot, normal forwarding, schedule the instructions in the loop including the branch delay slot. You may reorder the instructions, and modify the individual instruction operands, but do not undertake loop transformations that change the number of opcode of the instructions in the loop. Compute the number of cycles needed to execute the entire loop (c) Can you now transform this loop by unrolling so that number of stalls can be reduced? Is it possible to eliminate all stalls? You may now re-order instructions, use forwarding, and the branch delay slot. Compute the total number of cycles needed to execute the loop Consider a MIPS 5-stage pipeline with the execution stage consisting of 1-Integer, 1-FP multiply, 1-FP adder and 1-FP divide shown below. Integer operations takes 1 clock cycle, FP/integer multiply takes 7 clock cycles FP adder takes 4 clock cycles and FP divide takes 25 clock cycles. FP load is similar to an integer load operation. FO, 0 (R2) F4, 0 (R2) FO, FO, F4 F2, FO, F2 R2, R2, #8 R3, R3, #8 R5, R4, R2 R5, Loop Loop: L.D EX integer L.D MULT. ADD.D ADDI ADDI SUB EX Multiply) Assume that the initial value of R4 is R2+792 (a) Show the timing of this instruction sequence for the 5- IF ID MEM WB stage MIPS pipeline with forwarding. Assume that the branches are handled by flushing the pipeline. If all memory references hit in the cache, how many cycles does this loop take? (Hint: Make use of Excel spreadsheet where on one column show all the instructions and on multiple rows show the timing) EX FPA EX FPA Divider) (b) Assuming the pipeline with a single cycle delayed branch slot, normal forwarding, schedule the instructions in the loop including the branch delay slot. You may reorder the instructions, and modify the individual instruction operands, but do not undertake loop transformations that change the number of opcode of the instructions in the loop. Compute the number of cycles needed to execute the entire loop (c) Can you now transform this loop by unrolling so that number of stalls can be reduced? Is it possible to eliminate all stalls? You may now re-order instructions, use forwarding, and the branch delay slot. Compute the total number of cycles needed to execute the loop

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!