Question: Exercise 5 : Pipeline Hazards and Performance Instruction Sequence ( with sw and lw location changed ) : beq r 2 , r 1 ,

Exercise

5

Pipeline Hazards and Performance

Instruction Sequence

(

with sw and lw location changed

)

beq r

2,

1,

Label # Branch to Label if r

2 = =

1 (

assume not equal

)

add r

4,

6,

2

# Add r

6

and r

2,

store result in r

4

slt r

5,

8,

2

# Set r

5

1

if r

8

2,

else

0

sw r

14, 16 (

3)

# Store word in memory at address r

3 + 16

lw r

12, 12 (

3)

# Load word from memory at address r

3 + 12

Questions:

What is the total execution time of this instruction sequence in a

5 -

stage pipeline that only has

one memory

(

both instruction and data memory

) ?

Can you resolve the structural hazard by

adding NOPs?

Change the load

/

store instructions

(1

,

)

to use a register

(

without an offset

)

as the address.

Assuming this change does not affect clock cycle time, what speedup is achieved in this

instruction sequence compared to the original?

What speedup is achieved on this code if branch outcomes are determined in the ID stage,

relative to the execution where branch outcomes are determined in the EX stage?

Repeat the speedup calculation from question

2,

but now take into account the

(

possible

)

change in clock cycle time when EX and MEM are done in a single stage.

Assume the latency of the ID stage increases by

50 %

and the EX stage decreases by

10

.

What is the speedup achieved in this case?

What is the new clock cycle time and execution time of this instruction sequence if the beq

address computation is moved to the MEM stage? What is the speedup from this change,

assuming the latency of the EX stage is reduced by

20

?

7 .

Given the sequence of instructions and the use of beq, indicate where NOPs should be inserted to avoid data hazards

(

if any

),

assuming no forwarding and a

5 -

stage pipeline.

8 .

What is the clock cycle time in a pipelined processor and in a non

-

pipelined processor, using the given stage latencies? Consider the impact of each instruction on execution time and performance.

Exercise

6

Cache and Memory Performance Evaluation

You are tasked with analyzing the performance of a CPU with the following configuration:

Address Space:

32 -

bit addresses

(4

GB addressable memory

) .

Cache Configuration:

o Cache Size:

512

o Cache Line

(

Block

)

Size:

128

bytes

o Cache Associativity:

8 -

way set associative

o Write Policy: Write

-

back

o Write Allocation: Write

-

allocate

(

on write miss, load the block into the cache

)

o Replacement Policy: Least Recently Used

(

LRU

)

Main Memory:

4

GB of memory.

Part

1

: Cache Organization

1 .

Determine the number of blocks in the cache.

2 .

Determine the number of sets in the cache.

3 .

Determine the number of bits used for the block offset, index, and tag.

Part

2

: Cache Access Sequence

The CPU generates the following sequence of memory accesses

(

in hexadecimal

)

0

00000000, 0

00000800, 0

00001000, 0

00002000, 0

00003000, 0

00004000, 0

00008000, 0

00010000, 0

00020000, 0

00030000,

0

00040000, 0

00050000, 0

00060000, 0

00070000, 0

00080000, 0

00090000, 0

000

0000, 0

000

0000, 0

000

0000, 0

000

0000

For each of the memory accesses, determine if it results in a cache hit or a cache miss, and simulate the cache replacement process

(

using the LRU policy

) .

Part

3

: Virtual Memory

(

Page Table Simulation

)

Assume that the CPU uses paging for virtual memory with the following configuration:

Page Size:

4

Virtual Address Space:

32 -

bit, so the total virtual memory size is

4

.

Physical Address Space:

32 -

bit, so the total physical memory size is also

4

.

1 .

Determine the number of pages in virtual memory and the number of frames in physical memory.

2 .

Simulate the translation of virtual addresses to physical addresses for each memory access.

Part

4

: Performance Analysis

1 .

Calculate the cache hit ratio and miss penalty based on the cache access sequence.

2 .

Calculate the page fault rate and determine the effective memory access time

(

EMAT

)

considering a page fault penalty of

120

cycles.

Exercise

3

A CPU produces the following sequence of read addresses in hexadecimal:

4, 78, 30,

0, 7

,

8, 18,

4, 88, 70, 18,

4, 30, 7

The word size is

32

bits.

Assume an

8 -

word cache that is initially empty.

Implement a Least Recently Used

(

LRU

)

replacement policy.

For each of the following cache types, determine whether each address produces a hit or a miss:

Direct Mapping

Fully Associative

Two

-

way set

-

associative

Task:

1 .

Fill in the table with Address

(

Hex

),

Address

(

Binary

),

Direct Mapping, Fully Associative, and

2 -

Way Set Associative.

2 .

Sketch the cache after processing all addresses and note replacements.

3 .

Compare the hit ratio for each cache type.

4 .

Discuss how changing the cache design to use

2

words per block would affect the hit

/

miss behavior.

5 .

Explain the impact of miss penalty on the system performance.

Exercise 5 : Pipeline Hazards and Performance

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

In this question, we examine how resource hazards, control hazards, and ISA design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code: Instruction...

4.10 - 4.16 362 Chapter 4 The Processor 4.10 In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems...

this is all one question 4 (15 points) In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in...

this is only one question 4(15 points) In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in...

In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following...

Question 4 ( 3 0 points ) . A MIPS assembly code is shown below. The assembly code is executed using a 5 - stage MIPS processor and we can actually decide the branch a little earlier, in ID instead...

Please clearly explain/answer the problem, and show any calculation if necessary. In this exercise, we examine how resource hazards, control hazards, and ISA design can affect pipelined execution....

119.90) pils lili jkjklk jljjkl kjhjkh jkjhjh hgg 7: Given the below code is executed on the 5-stage MIPS pipeline with branch evaluation in Execution stage, answer the following questions:...

Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in...

Q3 (60 points) In this question, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the...

When the Macintosh computer was introduced in 1982, Apple made it difficult for third party software developers to develop software for the platform. In contrast, Apple made it relatively easy for...

Thomas Kratzer is the purchasing manager for the headquarters of a large insurance company chain with a central inventory operation. Thomas's fastest-moving inventory item has a demand of 6,000 units...

Very few mutual fund investors take advantage of automatic reinvestment of the current income received from their mutual fund. a . True b . False

Contingency planning looks at trends and developes future business frameworks a. True b. False