Question: Given the following C code: low = 0; VL = (n % MVL); /*find odd-size piece using modulo op % */ for (j = 0;
Given the following C code:
low = 0;
VL = (n % MVL); /*find odd-size piece using modulo op % */
for (j = 0; j <= (n/MVL); j=j+1) { /*outer loop*/
for (i = low; i < (low+VL); i=i+1) /*runs for length VL*/
Y[i] = a * X[i] + Y[i] ; /*main operation*/
low = low + VL; /*start of next vector*/
VL = MVL; /*reset the length to maximum vector length*/
}
Translate the code using our DLX vector instruction set. Assume:
Vector registers of length 8
Load unit has a startup of L clocks
Adder unit has a startup of A clocks
Multiplier unit has a startup of M clocks
For vectors of length N, compute the number of clock cycles to execute the inner loop (the vector operations) both for normal execution and then for allowing changing of loads/stores/addition/ multiplication. How much speedup do we achieve with chaining ?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
