Question: We want to study several instruction level parallelism techniques, we are given the following bench- mark program, assuming R 1 is initialized by 0, and

We want to study several instruction level parallelism techniques, we are given the following bench- mark program, assuming R 1 is initialized by 0, and R6, R7, R8, R9 and F10 contain constant non- zero values:

Loop: LD F12, 0(R6)

DIVD F14, F12, F10

LD F16, O(R7)

ADDD F16, F14, F16

LD F17, 0(R8)

MULTD F18, F17, F16

SD O(R9), F18

ADDI R6, R6, #4

ADDI R7, R7, #4

ADDI R8, R8, #4

ADDI R9, R9, #4

ADDI RI, RI, #1

SUBI R2, R1, #1000

BNEQZ R2, Loop Assuming a single scalar architecture, the available hardware resources & their respective latency are given below:

FU TYPE

#FUs

#EX cycles

integer

2

1

branch

1

1

load

3

2

store

2

2

FP adder

2

7

FP mulitplier

1

5

FP divider

1

24

a) Draw the hardware organization to implement dynamic scheduling with the Tomasulo algorithm. Do you expect an improved execution time compared to T1, T2 and T3? (Hint: do not perform any computations, answer from the theoretical point of view)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!