Question: We compare the performance of three dynamically scheduled processor architectures on a simple piece of code computing Y = (X+Y) Z, where X, Y, and
We compare the performance of three dynamically scheduled processor architectures on a simple piece of code computing Y = (X+Y) Z, where X, Y, and Z are (double-precision8 bytes) floating-point vectors. The loop body can be compiled as follows:
LOOP L.D F0,0(R1) // X[i] loaded in F0
L.D F2,0(R2) // Y[i] loaded in F2
L.D F4,0(R3) // Z[i] loaded in F4
MUL.D F6,F2,F0 // Multiply X by Y
ADD.D F8,F6,F4 // Add Z
ADDI R1,R1,#8 // update address registers
ADDI R2,R2,#8
ADDI R3,R3,#8
S.D F8, -8(R2) // store in Y[i]
BNE R4,R2,LOOP // (R4)-8 points to the last element of Y
The initial values in R1, R2, and R3 are such that the values are never equal during the entire execution. (This is important for memory disambiguation.) The architectures are given in Figures 3.15, 3.23, and 3.27, and the same parameters apply. Branch BNE is always predicted taken (except in Tomasulo, where branches are not predicted at all and stall in the dispatch stage until their outcome is known).
Keep in mind the following important rules (whenever they apply):
Instructions are always fetched, decoded, and dispatched in process order;
In speculative architectures, instructions always retire in process order;
In speculative architectures, stores must wait until they reach the top of the ROB before they can issue to cache.
Tomasulo algorithm no speculation. Please fill a table like the one given below clock-by-clock for the first iteration of the loop. Each entry should be the clock number when the event occurs, starting with clock 1. Add comments as you see fit. (This helps understand your thinking.)
|
| Dispatch | Issue | Exec start | Exec complete | Cache | CDB | Comments |
| I1 L.D F0, 0(R1) |
|
|
|
|
|
|
|
| I1 L.D F2, 0(R2) |
|
|
|
|
|
|
|
Tomasulo algorithm with speculation. Please fill a table like the one given below clock-by-clock for the first iteration of the loop. Each entry should be the clock number when the event occurs, starting with clock 1. Please be attentive to the fact that (contrary to Tomasulo with no speculation) stores cannot execute in cache until they reach the top of the ROB. Also branches are now predicted taken.
|
| Dispatch | Issue | Exec start | Exec comp. | Cache | CDB | Retire | Comments |
| I1 L.D F0, 0(R1) |
|
|
|
|
|
|
|
|
| I1 L.D F2, 0(R2) |
|
|
|
|
|
|
|
|
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
