Question: This is problem is slightly different from what i found and I dont understand how it changes. Please help with this problem Suppose we wish
This is problem is slightly different from what i found and I dont understand how it changes. Please help with this problem
Suppose we wish to write a procedure that computes the inner product of two vectors u and v. An abstract version of the function has a CPE of 14-18 with x86-64 for different types of integer and floating-point data. By doing the same sort of transformations we did to transform the abstract program combine1 into the more efficent combine4, we get the following code:
/* Inner product. Acculate in temporary*/
void inner4(vec_ptr u, vec_ptr v, data_t *dest)
{
long i;
long length = vec_length(u);
data_t *udata = get_vec_start(u);
data_t *udata = get_vec_startv)
data_t sum = (data_t) 0;
for (i=0; i < length; i++){
sum = sum + udata[i] * vdata[i];
}
*dest = sum;
}
Our measurements show that this function has CPEs of 1.50 for integer data and 3.00 for floating-point data. For data type double, the x86-64 assembly code for the inner loop is as follows:
Inner loop of inner4. data_t = double, OP = *
udata in %rbp, vdata in %rax, sum in %xmm0
i in %rcx, limit in %rbx
1 .L15: Loop:
2 vmoved 0(%rbp,%rcx,8), %xmm1 get udata[i]
3 vmulsd (%rax,%rcx,8), %xmm1, %xmm1 multiply by vdata[i]
4 vaddsd %xmm1, %xmm0, %xmm0 add to sum
5 addq $1, %rcx increment i
6 cmpq %rbx, %rcx compare i to limit
7 jne .L15 if !=, goto loop
Figure 5.12
| Integer | Floating Point | |||||
| Operation | Latency | Issue | Capacity | Latency | Issue | Capacity |
| Addition | 1 | 1 | 4 | 3 | 1 | 1 |
| Multiplication | 3 | 1 | 1 | 5 | 1 | 2 |
| Division | 3-30 | 3-30 | 1 | 3-15 | 3-15 | 1 |
Assume that the functional units have the characteristics listed in Figure 5.12.
A. Diagram how this instruction sequence would be decoded into operations
and show how the data dependencies between them would create a critical
path of operations, in the style of Figures 5.13 and 5.14.(data-flow graph)
B. For data type double, what lower bound on the CPE is determined by the
critical path?
C. Assuming similar instruction sequences for the integer code as well, what
lower bound on the CPE is determined by the critical path for integer data?
D. Explain how the two floating-point versions can have CPEs of 3.00, even
though the multiplication operation requires either 5 clock cycles
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
