Question: I was solving the exercise problem 4.17.6 of chapter 4 in the book Computer Organization and Design by Patterson and Hannessey (4th edition).. The problem
I was solving the exercise problem 4.17.6 of chapter 4 in the book Computer Organization and Design by Patterson and Hannessey (4th edition)..
The problem
Percentage occurrences of the instructions are as follows: and following are the clock cycles sizes for different stages:
We can convert all load/store instructions into register-based (no offset) and put the memory access in parallel with the ALU. Assume that the latency of the new EX/MEM stage is equal to the longer of their latencies. This change requires many existing LW/SW instructions to be converted into two-instruction sequences. If this is needed for 50% of these instructions, what is the overall speedup achieved by changing from the 5-stage pipeline to the 4-stage pipeline where EX and MEM are done in parallel?
The solution given
The latency of the pipelined datapath is unchanged (the maximum stage latency does not change). The clock cycle time of the single-cycle datapath is the sum of logic latencies for the four stages (IF, ID, WB, and the combined EX + MEM stage). We have: The number of instructions increases for the 4-stage pipeline, so the speedup is below 1 (there is a slowdown): 
Doubt
I feel this is pretty wrong. The solution simply seeks to first find the increase in the number of instructions and then compare them: 1\1.15=0.87. However it neglects the increased clock cycles (from 200ps to 215ps) and decreased stage count (from 5 to 4). I feel it should be (5*200)\(4*215*1.15)=1.01 So there is indeed little speedup. Am I right?
ADD 40% 60% BEQ 30% 10% LW 25% 20% SW 5% 10% a. b
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
