Question: 2. (a) Suppose a particular program contains 20 % branches. One branch delay slot needs to be usefully filled to avoid the branch penalty. Assuming

2.

(a) Suppose a particular program contains 20% branches. One branch delay slot needs to be usefully filled to avoid the branch penalty. Assuming that the compiler can fill 62% of the delay slots and that 85% of the instructions executed in the branch delay slots are useful, calculate the CPI. Assume that the ideal CPI is 1 and ignore all other hazards. The penalty incurred if a slot is not filled with useful instruction is 1 cycle. CPI = ____?

(b) A pipelined processor uses the delayed branch technique. You are asked to recommend one of the two possibilities for the design of this processor. In the first possibility, the processor has a 5-stage pipeline and one delay slot, and in the second possibility, it has a 6-stage pipeline with two delay slots. In each case, the pipeline will not stall if the delay slots are filled with useful instructions. Assume that 20% of the executed instructions are branch instructions and that the compiler has an 80% success rate in filling in the single delay slot with a useful instruction. In addition for the second alternative, the compiler is able to usefully fill the second delay slot 25% of the time. Calculate the CPI for each case and the speedup versus the non-pipelined processor for each case. Assume an ideal CPI of 1. The choice with the higher speedup is the better one.

For the 5-stage pipeline: CPI = ____? Speedup = ____?

For the 6-stage pipeline: CPI = ____? Speedup = ____?

(c) A MIPS-like pipelined processor has the following five stages: IF, ID, EX, MEM and WB. When executing conditional branches the condition is checked in the ID stage while the branch target address is calculated in the EX stage. Assume that the processor supports data forwarding and delayed branches for which you will need to decide on the number of delay slots. You are given the following program segment:

2. (a) Suppose a particular program contains 20% branches. One branch delay

How many delay slots should you include in the design to optimize the performance of the above program segment assuming that n > 5? What would be the total number of cycles required to execute this program segment after the delay slots are usefully filled? Can we reduce the number of cycles for n = 3 by doing some rescheduling? Number of delay slots = ____?

Number of cycles = n + ____?

Can we reduce the number of cycles for n = 3 by doing some rescheduling? Yes or No?

S SUBRi, R2, R3/*Ri is the destination*/ S2: Add R4, Rs, Rs R4 is the destination */ S-Sn Instructions not modifying Ri or R4 Sn: BZ R4, 100(R) /* Branch to 100+ (R) ifR4 = 0 */ S SUBRi, R2, R3/*Ri is the destination*/ S2: Add R4, Rs, Rs R4 is the destination */ S-Sn Instructions not modifying Ri or R4 Sn: BZ R4, 100(R) /* Branch to 100+ (R) ifR4 = 0 */

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!