Question: This is problem is slightly different from what i found and I dont understand how it changes. Please help with this problem Suppose we wish

This is problem is slightly different from what i found and I dont understand how it changes. Please help with this problem

Suppose we wish to write a procedure that computes the inner product of two vectors u and v. An abstract version of the function has a CPE of 14-18 with x86-64 for different types of integer and floating-point data. By doing the same sort of transformations we did to transform the abstract program combine1 into the more efficent combine4, we get the following code:

/* Inner product. Acculate in temporary*/

void inner4(vec_ptr u, vec_ptr v, data_t *dest)

{

long i;

long length = vec_length(u);

data_t *udata = get_vec_start(u);

data_t *udata = get_vec_startv)

data_t sum = (data_t) 0;

for (i=0; i < length; i++){

sum = sum + udata[i] * vdata[i];

}

*dest = sum;

}

Our measurements show that this function has CPEs of 1.50 for integer data and 3.00 for floating-point data. For data type double, the x86-64 assembly code for the inner loop is as follows:

Inner loop of inner4. data_t = double, OP = *

udata in %rbp, vdata in %rax, sum in %xmm0

i in %rcx, limit in %rbx

1 .L15: Loop:

2 vmoved 0(%rbp,%rcx,8), %xmm1 get udata[i]

3 vmulsd (%rax,%rcx,8), %xmm1, %xmm1 multiply by vdata[i]

4 vaddsd %xmm1, %xmm0, %xmm0 add to sum

5 addq $1, %rcx increment i

6 cmpq %rbx, %rcx compare i to limit

7 jne .L15 if !=, goto loop

Figure 5.12

Integer Floating Point
Operation Latency Issue Capacity Latency Issue Capacity
Addition 1 1 4 3 1 1
Multiplication 3 1 1 5 1 2
Division 3-30 3-30 1 3-15 3-15 1

Assume that the functional units have the characteristics listed in Figure 5.12.

A. Diagram how this instruction sequence would be decoded into operations

and show how the data dependencies between them would create a critical

path of operations, in the style of Figures 5.13 and 5.14.(data-flow graph)

B. For data type double, what lower bound on the CPE is determined by the

critical path?

C. Assuming similar instruction sequences for the integer code as well, what

lower bound on the CPE is determined by the critical path for integer data?

D. Explain how the two floating-point versions can have CPEs of 3.00, even

though the multiplication operation requires either 5 clock cycles

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!