Question: Use the following code fragment:In this exercise, we look at how software techniques can extract instruction - level parallelism ( ILP ) in a common

Use the following code fragment:In this exercise, we look at how software techniques can extract instruction

-

level parallelism

(

ILP

)

in a common vector loop. The following loop is the so

-

called DAXPY loop

(

double

-

precision a

* x

plus

Y)

and is the central operation in Gaussian elimination. The following code

implements the DAXPY operation, and

Y

are arrays with

100

elements

) .

Initially,

x 1

is set to the base address of array

x

and

x 2

is set to the base address of

Y, x 4

contains

800

to represent

100^{* * 8}

bytes of arrays

x

and

Y .

Assume the functional unit latencies as shown in the following table. Assume a one

-

cycle

delayed branch that resolves in the ID stage. Assume that results are fully bypassed

(

data

forwarding

) .

.

Assume a single

-

issue pipeline. Reorder code as necessary to minimize stalls. Remember to

use the latencies given in the table above. How many cycles are needed to complete one iteration?

.

Unroll the loop as many times as necessary to schedule it without any stalls, collapsing the loop

overhead instructions. How many times must the loop be unrolled? Show the instruction schedule.

What is the execution time per element of the result

(

or per iteration time

) ?

Assume that the initial value of

5

2 + 396 (

loop is repeated

99

times

)

.

Show the timing of this instruction sequence for the

5 -

stage RISC pipeline without any

forwarding or bypassing hardware but assuming that a register read and a write in the same clock

cycle

(

for example, when an instruction writes back result to a register in cycle

n,

another

instruction read the register in the same cycle n

) .

Assume that if branch instruction causes

2

stalls

if the branch is taken and zero cycle if not taken. Show the flow for one iteration and compute the

number of cycles needed to complete one iteration, then compute total number of cycles needed

to complete all

99

iterations.

.

Show the timing of this instruction sequence for the

5 -

stage RISC pipeline with full forwarding

and bypassing hardware. Remember that you need a stall after load if the next instruction needs

the value read from memory. Assume that if branch instruction causes

2

stalls if the branch is taken

and zero cycle if not taken. Show the flow for one iteration and compute the number of cycles

needed to complete one iteration, then compute total number of cycles needed to complete all

99

iterations.

.

High

-

performance processors have very deep pipelines

-

more than

15

stages. For this problem,

imagine that you have a

10 -

stage pipeline in which every stage of the

5 -

stage pipeline has been

split in two

(

that is we have two Instruction Fetch stages, say IF

1,

2,

two decode, D

1,

2,

etc

) .

The only catch is that, for data forwarding, data can be forwarded from the end of the second

execute or second memory stage. a pair of stages to the beginning of the two stages where they are

needed. So

,

data are forwarded from the output of the second execute stage to the

/

input of the first

execute stage, still causing a I

-

cycle delay. Show the timing of this instruction sequence for the

10 -

stage RISC pipeline with full forwarding and bypassing hardware. Assume branch causes

4

stalls if the branch is taken and zero if the branch is not taken. How many cycles does this loop

take to complete one iteration, and how many cycles to complete all

99

iterations?

Use the following code fragment:In this exercise,

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please explain in detail 1. [100 pts] In this exercise, we look at how software techniques can extract instruction- level parallelism (ILP) in a common vector loop. The following loop is the...

[ 1 0 0 pts ] In this exercise, we look at how software techniques can extract instruction - level parallelism ( ILP ) in a common vector loop. The following loop is the so - called DAXPY loop (...

3.14 In this exercise, we look at how software techniques can extract instruction-level parallelism (ILP) in a common vector loop. The following loop is the so-called DAXPY loop (double-precision ax...

1. [100 pts] In this exercise, we look at how software techniques can extract instructionlevel parallelism (ILP) in a common vector loop. The following loop is the so-called DAXPY loop...

Q1: In this exercise, we look at how software techniques can extract instruction-level parallelism (ILP) in a common vector loop. The following loop is the so-called DAXPY loop (double- precision aX...

In this exercise, we look at how software techniques can extract instruction-level parallelism (ILP) in a common vector loop. The following loop is the so-called DAXPY loop (double-precision aX plus...

Problem 1: In this exercise, we look at how software techniques can extract instruction-level parallelism (ILP) in a common vector loop. The following loop is the so-called DAXPY loop...

A thin hoop of radius R and mass M oscillates in its own plane hanging from a single fixed point. Attached to the hoop is a small mass M constrained to move (in a frictionless manner) along the hoop....

Do all suppliers have an equal impact on product quality? Discuss the conditions under which one supplier may have a greater impact on a firm's final product quality as compared to another supplier.

Negligence is when a HCP injures a user by failing to exercise the degree of skill and care of a reasonably competent practitionerSelect one:TrueFalseState whether the following statement is true or...

Assume Zo's utility function is U(C, L) = CL where C is consumption measured by her total income and L is leisure. Total income is labour income and non-labour income. She has y = $400 non-labour...