Question: Exercise 5 : Pipeline Hazards and Performance Instruction Sequence ( with sw and lw location changed ) : beq r 2 , r 1 ,

Exercise 5:
Pipeline Hazards and Performance
Instruction Sequence (with sw and lw location changed):
beq r2, r1, Label # Branch to Label if r2== r1(assume not equal)
add r4, r6, r2 # Add r6 and r2, store result in r4
slt r5, r8, r2 # Set r5 to 1 if r8 r2, else 0
sw r14,16(r3) # Store word in memory at address r3+16
lw r12,12(r3) # Load word from memory at address r3+12
Questions:
What is the total execution time of this instruction sequence in a 5-stage pipeline that only has
one memory (both instruction and data memory)? Can you resolve the structural hazard by
adding NOPs?
Change the load/store instructions (1w,sw ) to use a register (without an offset) as the address.
Assuming this change does not affect clock cycle time, what speedup is achieved in this
instruction sequence compared to the original?
What speedup is achieved on this code if branch outcomes are determined in the ID stage,
relative to the execution where branch outcomes are determined in the EX stage?
Repeat the speedup calculation from question 2, but now take into account the (possible)
change in clock cycle time when EX and MEM are done in a single stage.
Assume the latency of the ID stage increases by 50% and the EX stage decreases by 10 ps .
What is the speedup achieved in this case?
What is the new clock cycle time and execution time of this instruction sequence if the beq
address computation is moved to the MEM stage? What is the speedup from this change,
assuming the latency of the EX stage is reduced by 20ps?
7. Given the sequence of instructions and the use of beq, indicate where NOPs should be inserted to avoid data hazards (if any), assuming no forwarding and a 5-stage pipeline.
8. What is the clock cycle time in a pipelined processor and in a non-pipelined processor, using the given stage latencies? Consider the impact of each instruction on execution time and performance.
Exercise 6:
Cache and Memory Performance Evaluation
You are tasked with analyzing the performance of a CPU with the following configuration:
Address Space: 32-bit addresses (4GB addressable memory).
Cache Configuration:
o Cache Size: 512 KB
o Cache Line (Block) Size: 128 bytes
o Cache Associativity: 8-way set associative
o Write Policy: Write-back
o Write Allocation: Write-allocate (on write miss, load the block into the cache)
o Replacement Policy: Least Recently Used (LRU)
Main Memory: 4GB of memory.
Part 1: Cache Organization
1. Determine the number of blocks in the cache.
2. Determine the number of sets in the cache.
3. Determine the number of bits used for the block offset, index, and tag.
Part 2: Cache Access Sequence
The CPU generates the following sequence of memory accesses (in hexadecimal):
0x00000000,0x00000800,0x00001000,0x00002000,0x00003000,0x00004000,0x00008000,0x00010000,0x00020000,0x00030000,
0x00040000,0x00050000,0x00060000,0x00070000,0x00080000,0x00090000,0x000A0000,0x000B0000,0x000C0000,0x000D0000
For each of the memory accesses, determine if it results in a cache hit or a cache miss, and simulate the cache replacement process (using the LRU policy).
Part 3: Virtual Memory (Page Table Simulation)
Assume that the CPU uses paging for virtual memory with the following configuration:
Page Size: 4KB
Virtual Address Space: 32-bit, so the total virtual memory size is 4GB.
Physical Address Space: 32-bit, so the total physical memory size is also 4GB.
1. Determine the number of pages in virtual memory and the number of frames in physical memory.
2. Simulate the translation of virtual addresses to physical addresses for each memory access.
Part 4: Performance Analysis
1. Calculate the cache hit ratio and miss penalty based on the cache access sequence.
2. Calculate the page fault rate and determine the effective memory access time (EMAT) considering a page fault penalty of 120 cycles.
Exercise 3:
A CPU produces the following sequence of read addresses in hexadecimal:
A4,78,30, C0,7C, F8,18, A4,88,70,18, D4,30,7C
The word size is 32 bits.
Assume an 8-word cache that is initially empty.
Implement a Least Recently Used (LRU) replacement policy.
For each of the following cache types, determine whether each address produces a hit or a miss:
Direct Mapping
Fully Associative
Two-way set-associative
Task:
1. Fill in the table with Address (Hex), Address (Binary), Direct Mapping, Fully Associative, and 2-Way Set Associative.
2. Sketch the cache after processing all addresses and note replacements.
3. Compare the hit ratio for each cache type.
4. Discuss how changing the cache design to use 2 words per block would affect the hit/miss behavior.
5. Explain the impact of miss penalty on the system performance.
Exercise 5 : Pipeline Hazards and Performance

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!