Question: 4.9 Performance gain with vector processing A vector supercomputer has special instructions that perform arithmetic operations on vectors. For example, vector-multiply on vectors A and
4.9 Performance gain with vector processing A vector supercomputer has special instructions that perform arithmetic operations on vectors. For example, vector-multiply on vectors A and B of length 64 is equivalent to 64 independent multiplications A [i] B[i]. Assume that the machine has a CPI of 2 on all scalar arithmetic instructions. Vector arithmetic on vectors of length m takes 8 +m cycles, where 8 is the start-up/wind-down overhead for the pipeline that allows one arithmetic operation to be initiated in every clock cycle; thus, vector-multiply takes 72 clock cycles for vectors of length 64. Consider a program with only arithmetic instructions (i.e., ignore all else), with half these instructions involving scalar and half involving vector operands. a. What is the speedup achieved if the average vector length is 16? b. What is the break-even vector length for this machine (average vector length to result in equal or greater performance due to vector processing)? c. What is the required average vector length to achieve a speedup of 1.8
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
