Question: HW 4 - 1 ( 4 3 points ) Suppose we wish to write a procedure that computes the inner product of two vectors

4 - 1 (43

points

)

Suppose we wish to write a procedure that computes the inner product of two vectors

\ (

\)

and

\ (

\) .

An abstract version of the function has a CPE of

14 - 18

with x

86 - 64

for different types of integer and floating

-

point data. Doing the same sort of transformations as in the text to get from the program combine

1

to the more efficient combine

4,

we get the following code:

` ` `

typedef float data

_

#include "vec.h

"

long i;

long length

=

vec

_

length

(

)

;

data

_

*

udata

=

get

_

vec

_

start

(

)

;

data

_

*

vdata

=

get

_

vec

_

start

(

)

;

data

_

t sum

= (

data

_

) 0

;

for

(

= 0

; i

length; i

+ +) {

sum

=

sum

+

udata

[

] *

vdata

[

]

;

}

*

dest

=

sum;

}

` ` `

void inner

4 (

vec

_

ptr u

,

vec

_

ptr v

,

data

_

*

dest

) {

Our measurements show that this function has a CPE of

1.50

for integer data and

3.00

for floatingpoint data. For data type double, the x

86 - 64

assembly code for the inner loop

(

produced on our virtual machine with flags

- 02, -

mavx

2,

and

-

S is as follows:

` ` `

# Inner loop of inner

4 .

data

_

=

double. OP

= * .

# udata in

%

rbp

,

vdata

%

rax, sum in

%

xmm

0,

i in rcx

,

limit in rbx

.

15

: # loop:

vmovsd O

(%

rbp

, %

rcx

, 8), %

xmm

1

# Get udata

[

]

vmulsd

(%

rax,

%

rcx

, 8), %

xmm

1, %

xmm

1

# Multiply by vdata

[

]

vaddsd

%

xmm

1, %

xmm

0, %

xmm

0

# Add to sum

addq $

1, %

rcx # Increment i

cmpq

%

rbx

, %

rcx # Compare i:limit

.

15

# If

,

goto loop

` ` `

The new details of floating

-

point assembly code are pretty fully captured by just looking at Figures

3.45, 3.46,

and

3.49

with their captions.

Assume that the functional units have the latencies and issue times given in Figure

5.12 (

and in the course notes

) .

.

Diagram how this instruction sequence would be decoded into operations, and show how the data dependencies between them would create a critical path of operations. This process of diagramming is illustrated in Figures

5.13 (

dpb

-

sequential.pptx

(

live

.

com

)), 5.14 (

Figure: dpb

-

flow.pptx

(

live

.

com

)

and Figure: dpb

-

flow

-

abstract

.

pptx

(

live

.

com

)),

and

5.15 (

Figure: dpb

-

flow

-

multiple.pptx

(

live

.

com

))

; you can draw just a diagram in the style of

5.14 (

),

but do add identification of where the critical path is

. (25

points.

)

.

For data type double, what lower bound on the CPE is determined by the critical path? Give a numerical value and an explanation.

(6

points.

)

.

Assuming similar instruction sequences for the integer code as well, what lower bound on the CPE is determined by the critical path for integer data? Give a numerical value and an explanation.

(6

points.

)

.

Explain how the floating

-

point version can have a CPE of

3.00

even though the multiplication operation requires

5

cycles.

(6

points.

)

4 - 2 (27

points

)

.

Write a version of the inner product procedure described in the previous problem that uses five

-

way loop unrolling

(\ (5 \

times

1 \)

; no parallelism

) . (15

points.

)

For x

86 - 64,

our measurements of the unrolled version give a CPE of

1.07

for integer data but still

3.01

for floating

-

point data.

.

Explain why any version of any inner product procedure

(

even with parallelism

)

cannot achieve a CPE less than

1.00 . (6

points.

)

.

Explain why the performance for floating

-

point data did not improve with loop unrolling.

(6

points.

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Problem 5.13 Suppose we wish to write a procedure that computes the inner product of two vectors u and v. An abstract version of the function has a CPE of 14-18 with x86-64 for different types of...

This is problem is slightly different from what i found and I dont understand how it changes. Please help with this problem Suppose we wish to write a procedure that computes the inner product of two...

Could you provide a solution for problem 5.15 in the book that is titled "Computer Systems: A Programmer's Perspective (Third Edition)" by Bryant and O'Hallaron? The full text of problem 5.15 is...

Suppose we wish to write a procedure that computes the inner product of two vectors u and v. An abstract version of the function has a CPE of 1418 with x86-64 for different types of integer and...

1 Multivariable functions and vector geometry Due date and time: Wednesday, June 7th at 9:30 AM in lecture Please ensure that you have read and understood the document \"Homework Guidelines" on the...

This needs to be done in python. 0. Create a new markdown cell below and type in the answers. 1. Write a function that takes in two vectors and returns their inner product. Use a loop or nditer, not...

C++ Dot Product using dynamic allocation Write a C++ program that does the following 1. Has the user to input an integer n 2. Dynamically allocates two n-dimensional vectors of doubles 3. Has the...

code in python 1. Write a function that takes in two vectors and returns their inner product. Use a loop or nditer, not inbuilt functions (we aren't aiming for a good implementation, but to ensure...

19 Exam: 07.09 Vectors Exam Part X Homework Help - Q&A from Or x + C learn.flvs.net/educator/student/examform.cgi?sgregory69*kevinkhan8813*sIt=Mmph8Z1vC5Gss*5274*0064**passedonce** GABAI Update :...

The Longenes Company uses a target capital structure when calculating the cost of capital. The target structure and current component costs based on market conditions follow. The firm expects to earn...

What does it mean if a loan is amortized? What do the loan payments represent?

DESCRIPTION UNIT OF COST L.E.M. ITEM NO. S.S. REF QUANTITY Overhead Profit 12% TOTAL UNIT PRICE MEASURE & Subs 1.0 01010S General Information 1.1 22 Health & Safety Plan L.S. 100% $ 6,000 2.0 02223S...

An advantage of investing in a 401(k) plan is which of the follwing below? Ability to invest up to 25% of your annual income. Opportunity to save $25,000 per year. Possibility of receiving an...