In this exercise, we will look at how variations on Tomasulos algorithm perform when running the loop
Question:
In this exercise, we will look at how variations on Tomasulo’s algorithm perform when running the loop from Exercise 3.14. The functional units (FUs) are described in the table below. Assume the following:
■Functional units are not pipelined.
■There is no forwarding between functional units; results are communicated by the common data bus (CDB).
■The execution stage (EX) does both the effective address calculation and the memory access for loads and stores. Thus, the pipeline is IF/ID/IS/EX/WB.
■Loads require one clock cycle.
■The issue (IS) and write-back (WB) result stages each require one clock cycle.
■There are five load buffer slots and five store buffer slots.
■Assume that the Branch on Not Equal to Zero (BNEZ) instruction requires one clock cycle.
a.) For this problem use the single-issue Tomasulo MIPS pipeline of Figure 3.6 with the pipeline latencies from the table above. Show the number of stall cycles for each instruction and what clock cycle each instruction begins execution (i.e., enters its first EX cycle) for three iterations of the loop. How many cycles does each loop iteration take? Report your answer in the form of a table with the following column headers:
■Iteration (loop iteration number)
■Instruction
■Issues (cycle when instruction issues)
Computer Organization and Design The Hardware Software Interface
ISBN: 978-0124077263
5th edition
Authors: David A. Patterson, John L. Hennessy