Question: Latency and Throughput Bounds A superscalar with four function units can perform integer multiplication and floating point multiplication with computation times (measured in clock cycles)
Latency and Throughput Bounds A superscalar with four function units can perform integer multiplication and floating point multiplication with computation times (measured in clock cycles) as follows:
Latency Issue Capacity
mul. 3 1 2
fmul 5 5 2
For the purposes of the calculations, assume that the cycle time is 250 ps, i.e., an equivalent scalar machine would run at a peak rate of 4 GIPS.
(a) [2 marks] In the case of mul, calculations can be issued every clock cycle (I = 1), but they take 3 clock cycles to complete (L = 3). Does it make sense that the issue time is strictly less than the latency? Why or why not? Would it make sense if the issue time was strictly greater than the latency? Why or why not?
(b) [2 marks] Compute the latency bound and throughput bound for mul and fmul. Express your answers in cycles per instruction (CPI) and GIPS.
(c) [1 mark] Consider a program that does nothing but a long sequence of multiplication instructions. If there are no data dependencies among the multiplications, i.e., they could be run in any order with maximum parallelism, how quickly could multiplications be performed? Express your answer using GIPS. Give one answer for integer mul and one for floating-point fmul.
(d) [1 mark] If the program instead contained one long critical path, i.e., one linear data dependent sequence of multiplications like:
x <- x * a
x <- x * b
x <- x * c
x <- x * d
x <- x * e
x <- x * f ...
how quickly could multiplications be performed? Again, give two answers: one for mul, and one for fmul.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
