Question: Exercise 4.35 This exercise is intended to help you better understand the relationship between ISA design and pipelining. Problems in this exercise assume that we

Exercise 4.35 This exercise is intended to help you better understand the relationship between ISA design and pipelining. Problems in this exercise assume that we have a multiple-issue pipelined processor with the following number of pipeline stages, instructions issued per cycle, stage in which branch outcomes are resolved, and branch predictor accuracy:

Pipeline depth Issue width Branches execute in stage Branch predictor accuracy Branches as a % of instructions

a. 10 4 7 80% 20%

b. 25 2 17 92% 25%

4.35.1 [5] <4.8, 4.13> Control hazards can be eliminated by adding branch delay slots. How many delay slots must follow each branch if we want to eliminate all control hazards in this processor?

4.35.2 [10] <4.8, 4.13> What is the speed-up that would be achieved by using four branch delay slots to reduce control hazards in this processor? Assume that there are no data dependences between instructions and that all four delay slots can be fi lled with useful instructions without increasing the number of executed instructions. To make your computations easier, you can also assume that the mispredicted branch instruction is always the last instruction to be fetched in a cycle, i.e., no instructions that are in the same pipeline stage as the branch are fetched from the wrong path.

4.35.3 [10] <4.8, 4.13> Repeat Exercise 4.35.2, but now assume that 10% of executed branches have all four delay slots fi lled with useful instruction, 20% have only three useful instructions in delay slots (the fourth delay slot is a nop), 30%
have only two useful instructions in delay slots, and 40% have no useful instructions in their delay slots.
The remaining four problems in this exercise refer to the following C loop:

a. for(i=0;i!=j;i++){
b[i]=a[i];
}

b. for(i=0;a[i]!=a[i+1];i++){
c++;
}
4.35.4 [10] <4.8, 4.13> Translate this C loop into MIPS instructions, assuming that our ISA requires one delay slot for every branch. Try to fi ll delay slots with non-nop instructions when possible. You can assume that variables a,b,c,i, and j are kept in registers $1,$2,$3,$4,and $5.
4.35.5 [10] <4.7, 4.13> Repeat Exercise 4.35.4 for a processor that has two delay slots for every branch.
4.35.6 [10] <4.10, 4.13> How many iterations of your loop from Exercise 4.35.4 can be “in fl ight” within this processor’s pipeline? We say that an iteration is “in fl ight” when at least one of its instructions has been fetched and has not yet been committed.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock