Question: Problem #2 Consider an in-order pipelined RISC architecture with a branch delay cycle. The architecture has pipelined functional units with the following number of execution

Problem #2 Consider an in-order pipelined RISC architecture with a branch delay cycle. The architecture has pipelined functional units with the following number of execution cycles (note that these are not latencies between instructions): Floating point multiply: 4 cycles Floating point divider: 8 cycles Floating point adder: 2 cycles Integer operations: 1 cycle Memory load/store: 3 cycles Assume that there is no delay between integer operations and dependent branch instruction. The following code computes a portion of a filter operation. Assume that R1 contains a pointer to the beginning of a window of floating-point numbers and constants. Further R2 contain a pointer to an output array. Let R3 be the size of the window, FO and Fl be constants. Finally, assume that F5 is initialized to a bias constant outside the loop. FILTER: LDF MULTF LDF ADDF DIVF STF ADDI ADDI SUBI F3, O(R1) F10, F3, FO F4, 4(R1) F11, F4, F10 F12, F11, F1 0(R2), F12 RI, Ri, #8 R2, R2, #4 R3, #1 R3, R3, FILTER BNE NOP (a) How many cycles does each iteration take, without rearranging the code? Indicate stalls in the above code. (b) Rearrange code (no unrolling) to determine the lowest number of cycles per iteration. Problem #2 Consider an in-order pipelined RISC architecture with a branch delay cycle. The architecture has pipelined functional units with the following number of execution cycles (note that these are not latencies between instructions): Floating point multiply: 4 cycles Floating point divider: 8 cycles Floating point adder: 2 cycles Integer operations: 1 cycle Memory load/store: 3 cycles Assume that there is no delay between integer operations and dependent branch instruction. The following code computes a portion of a filter operation. Assume that R1 contains a pointer to the beginning of a window of floating-point numbers and constants. Further R2 contain a pointer to an output array. Let R3 be the size of the window, FO and Fl be constants. Finally, assume that F5 is initialized to a bias constant outside the loop. FILTER: LDF MULTF LDF ADDF DIVF STF ADDI ADDI SUBI F3, O(R1) F10, F3, FO F4, 4(R1) F11, F4, F10 F12, F11, F1 0(R2), F12 RI, Ri, #8 R2, R2, #4 R3, #1 R3, R3, FILTER BNE NOP (a) How many cycles does each iteration take, without rearranging the code? Indicate stalls in the above code. (b) Rearrange code (no unrolling) to determine the lowest number of cycles per iteration
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
