1 ) ( 2 5 points ) a ) What is the baseline performance ( in cycles, per loop iteration ) of the code sequence in Figure 1 if no new instruction's execution could be initiated until the previous instruction's execution had completed Ignore front end fetch and decode Assume that execution does not stall for lack of the next instruction, but only one instruction cycle can be issued Assume the branch is taken, and that there is a one cycle branch delay slot ( In the following code, you may assume R x is as x 1 register, R y register is as x 2 register ) Figure 1 Code and latencies for question 1 b ) Considering true data dependencies and functional unit latencies, reorder ( schedule ) the instructions to improve performance of the code in Figure 1 Calculate the required cycles per iteration of the loop c ) Using different registers to prevent name dependencies, hand unroll two iterations of the loop in your reordered code obtained from ( b ) Calculate the required cycles per iteration of the loop d ) Now, reorder ( schedule ) the unrolled code obtained from ( c ) Calculate the required cycles per iteration of the loop

The Answer is in the image, click to view ...

Question: 1 . ) ( 2 5 points ) a - ) What is the baseline performance ( in cycles, per loop iteration ) of the

1 .) (25

points

)

-)

What is the baseline performance

(

in cycles, per loop iteration

)

of the code sequence in Figure

- 1

if no new instruction's execution could be initiated until the previous instruction's execution had

completed? Ignore front

-

end fetch and decode. Assume that execution does not stall for lack

of the next instruction, but only one instruction

/

cycle can be issued. Assume the branch is taken,

and that there is a one

-

cycle branch delay slot.

(

In the following code, you may assume

R x

is as

x 1

R y

x 2

)

Figure

- 1

: Code and latencies for question

- 1

-)

Considering true data dependencies and functional unit latencies, reorder

(

schedule

)

the

instructions to improve performance of the code in Figure

- 1 .

Calculate the required cycles per

iteration of the loop.

-)

Using different registers to prevent name dependencies, hand

-

unroll two iterations of the loop in

your reordered code obtained from

(

) .

Calculate the required cycles per iteration of the loop.

-)

Now, reorder

(

schedule

)

the unrolled code obtained from

(

) .

Calculate the required cycles per

iteration of the loop.

1 . ) ( 2 5 points ) a - ) What is the baseline

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Question 1 You are tasked with designing a new processor microarchitecture, and you are trying to figure out how best to allocate your hardware resources. Which of the hardware and software...

What would be the baseline performance (in cycles, per loop iteration) of the code sequence in Figure 2.35 if no new instruction execution could be initiated until the previous instruction execution...

What is the baseline performance (in cycles, per loop iteration) of the code sequence in Figure 3.47 if no new instructions execution could be initiated until the previous instructions execution had...

[10/10/10] Assume a five-stage single-pipeline microarchitecture (fetch, decode, execute, memory, write-back) and the code in Figure 3.53. All ops are one cycle except LW and SW, which are 1 +2...

3.1 [10] What would be the baseline performance n cycles, per loop iteration) of the code sequence in Figure 3.48 if no new instruction's execution could be initiated until the previous instruction's...

Problem 1 : 3 . 1 What is the baseline performance ( in cycles, per loop iteration ) of the code sequence in Figure 3 . 4 7 if no new instruction's execution could be initiated until the previous...

Assume a five-stage single-pipeline microarchitecture (fetch, decode, execute, memory, write-back) and the code in figure below. All ops are one cycle except LW and SW, which are 1 + 2 cycles, and...

Assume a five-stage single-pipeline microarchitecture (fetch, decode, execute, memory, write back) and the code in Figure 2.41. All ops are 1 cycle except LW and SW, which are 1 + 2 cycles, and...

Latencies beyond single cycle What is the baseline performance ( in cycles, per loop iteration ) of the code sequence given above if no new instruction's execution could be initiated until the...

Assume a five-stage single-pipeline microarchitecture (fetch, decode, execute, memory, write-back) and the code in figure below. All ops are one cycle except LW and SW, which are 1 + 2 cycles, and...

You are manager of a district that has just hired several recent university and college graduates. Most of these people are starting their first full-time job, although most or all have held...

Develop the weak form and the finite element model of the following differential equation over an element: d du - (a)- + dx dx d du dx dx + cuf for x

7. When using the __fastcall calling convention, what might happen if your inline assembly code modifies registers?

Ramon Inc. manufactures a product that gives rise to a by-product called "Great". The only costs associated with Great are additional processing costs of P1.00 for each unit. Ramon accounts for...