Question: We compare the performance of three dynamically scheduled processor architectures on a simple piece of code computing Y = (X+Y) Z, where X, Y, and

We compare the performance of three dynamically scheduled processor architectures on a simple piece of code computing Y = (X+Y) Z, where X, Y, and Z are (double-precision8 bytes) floating-point vectors. The loop body can be compiled as follows:

LOOP L.D F0,0(R1) // X[i] loaded in F0

L.D F2,0(R2) // Y[i] loaded in F2

L.D F4,0(R3) // Z[i] loaded in F4

MUL.D F6,F2,F0 // Multiply X by Y

ADD.D F8,F6,F4 // Add Z

ADDI R1,R1,#8 // update address registers

ADDI R2,R2,#8

ADDI R3,R3,#8

S.D F8, -8(R2) // store in Y[i]

BNE R4,R2,LOOP // (R4)-8 points to the last element of Y

The initial values in R1, R2, and R3 are such that the values are never equal during the entire execution. (This is important for memory disambiguation.) The architectures are given in Figures 3.15, 3.23, and 3.27, and the same parameters apply. Branch BNE is always predicted taken (except in Tomasulo, where branches are not predicted at all and stall in the dispatch stage until their outcome is known).

Keep in mind the following important rules (whenever they apply):

Instructions are always fetched, decoded, and dispatched in process order;

In speculative architectures, instructions always retire in process order;

In speculative architectures, stores must wait until they reach the top of the ROB before they can issue to cache.

Tomasulo algorithm no speculation. Please fill a table like the one given below clock-by-clock for the first iteration of the loop. Each entry should be the clock number when the event occurs, starting with clock 1. Add comments as you see fit. (This helps understand your thinking.)

	Dispatch	Issue	Exec start	Exec complete	Cache	CDB	Comments
I1 L.D F0, 0(R1)
I1 L.D F2, 0(R2)

Tomasulo algorithm with speculation. Please fill a table like the one given below clock-by-clock for the first iteration of the loop. Each entry should be the clock number when the event occurs, starting with clock 1. Please be attentive to the fact that (contrary to Tomasulo with no speculation) stores cannot execute in cache until they reach the top of the ROB. Also branches are now predicted taken.

	Dispatch	Issue	Exec start	Exec comp.	Cache	CDB	Retire	Comments
I1 L.D F0, 0(R1)
I1 L.D F2, 0(R2)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

The new line character is utilized solely as the last person in each message. On association with the server, a client can possibly (I) question the situation with a client by sending the client's...

re Regular Languages and Finite Automata (a) Let L be the set of all strings over the alphabet {a, b} that end in a and do not contain the substring bb. Describe a deterministic finite automaton...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

Give Correct ANSWERS Human-Computer Interaction (a) If you had been one of the original inventors of the WIMP interface, and engineers on the technical team had been sceptical about the advantages...

Describe how to construct the function cpo ((D E), v) of two cpos (D, vD) and (E, vE). Prove that ((D E), v) is a cpo. (You may use facts about least upper bounds provided you state them clearly.)...

For monotone functions f, f0: P Q between posets (P, vP ) and (Q, vQ), let f v f(i) Prove that the binary relation v is a partial order. [3 marks] (ii) For monotone functions between posets p : P 0...

Question: Check you are in charge of the design of both hardware and software for a new (but fairly conventional) workstation which will have its peripherals (for example a disc drive and a printer)...

I have to create a program in C and I can't figure it out. The program has to read a source file. Please help. /******************************************************************** PROJECT: Glossary...

What is a branch delay slot and why does it arise? [7 marks] How can branch delays be avoided? If a processor exhibited one branch delay slot how would you reorder (and possibly modify) the...

In the context of fraud, explain the differences between (1) incentives and pressures, (2) opportunity, and (3) attitudes and rationalization. Why is it important for an auditor to consider client...

Record the repurchase of 17.6 million shares using information from Note 25, "Share Repurchase". Note: Enter debits before credits. Date General Journal Debit Credit February 03, 2018 Retained...

Why would a manager choose only investments that return the highest net income per dollar invested?

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

c. How is trust demonstrated?

c. Will leaders rotate periodically?

b. Will there be one assigned leader?