Question: This exercise is intended to help you understand the cost/complexity/ performance trade-off s of forwarding in a pipelined processor. Problems in this exercise refer to

This exercise is intended to help you understand the cost/complexity/ performance trade-off s of forwarding in a pipelined processor. Problems in this exercise refer to pipelined datapaths from Figure 4.45. These problems assume that, of all the instructions executed in a processor, the following fraction of these instructions have a particular type of RAW data dependence. The type of RAW data dependence is identified by the stage that produces the result (EX or MEM) and the instruction that consumes the result (1st instruction that follows the one that produces the result, 2nd instruction that follows, or both). We assume that the register write is done in the first half of the clock cycle and that register reads are done in the second half of the cycle, so €œEX to 3rd€ and €œMEM to 3rd€ dependences are not counted because they cannot result in data hazards. Also, assume that the CPI of the processor is 1 if there are no data hazards.

Figure 4.45

sub $11, $2, $3 Iw $13, 24 ($1) Iw $10, 20(S1) add $14, $5, $6 add $12, $3, $4 Instruction fetch Instruction decode Exec

EX (FW from MEM/ WB only) EX EX EX (FW from (no FW) (full FW) EX/MEM only) MEM WB IF ID 150 ps 130 ps 120 ps 100 ps 100

1. If we use no forwarding, what fraction of cycles are we stalling due to data hazards?

2. If we use full forwarding (forward all results that can be forwarded), what fraction of cycles are we staling due to data hazards?

3. Let us assume that we cannot afford to have three input Muxes that are needed for full forwarding. We have to decide if it is better to forward only from the EX/MEM pipeline register (next-cycle forwarding) or only from the MEM/WB pipeline register (two-cycle forwarding). Which of the two options results in fewer data stall cycles?

4. For the given hazard probabilities and pipeline stage latencies, what is the speedup achieved by adding full forwarding to a pipeline that had no forwarding?

5. What would be the additional speedup (relative to a processor with forwarding) if we added time-travel forwarding that eliminates all data hazards? Assume that the yet-to-be-invented time-travel circuitry adds 100 ps to the latency of the full-forwarding EX stage.

6. Repeat 4.12.3 but this time determine which of the two options results in shorter time per instruction.

sub $11, $2, $3 Iw $13, 24 ($1) Iw $10, 20(S1) add $14, $5, $6 add $12, $3, $4 Instruction fetch Instruction decode Execution Memory Write-back MEMWB IFAD IDVEX EXMEM Add Ags A Trosuth Shift let 2 Addrass Read Taghlar Read data Read Zero ALU ALU ragistar 2 Registors Ruad Write agistar Instruction Read Addross momory resut data data 2 Data Write data memory Write data 16 Sign- axtand EX (FW from MEM/ WB only) EX EX EX (FW from (no FW) (full FW) EX/MEM only) MEM WB IF ID 150 ps 130 ps 120 ps 100 ps 100 ps 120 ps 150 ps 140 ps

Step by Step Solution

★★★★★

3.44 Rating (163 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

1 Dependences to the 1 st next instruction result in 2 stall cycles and the stall is also 2 cycles i... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Computer Organization Design Questions!

This exercise is intended to help you understand the relationship between forwarding, hazard detection, and ISA design. Problems in this exercise refer to the following sequence of instructions, and...

This exercise is intended to help you understand the relationship between delay slots, control hazards, and branch execution in a pipelined processor. In this exercise, we assume that the following...

Technologies such as VoIP used by Skype and similar products make it more difficult to monitor outgoing information. Search the Internet to help you understand these technologies and why these...

RISC-V dfAssignment 1 nam.. Assignment 1-7 pdf 4 724 /1665 4.26 This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor....

Let us assume that we cannot afford to have three-input Muxes that are needed for full forwarding. We have to decide if it is better to forward only from the EX/MEM pipeline register (next-cycle...

If we use no forwarding, what fraction of cycles are we stalling due to data hazards? This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a...

2. This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor. Problems in this exercise refer to pipelined data paths from...

Problem 0 2 : ( 3 5 points ) This exercise is intended to help you understand the cost / complexity / performance trade - offs of forwarding in a pipelined processor. Problems in this exercise refer...

At the beginning discuss research by J. Sholl et al. on the relationship between gender and sense of direction. Recall that, in their study, the spatial orientation skills of 30 male and 30 female...

In the period immediately following its initial public offering (IPO), OHaganBooks.com's shares were doubling in value every 3 hours. If you bought $10,000 worth of the stock when it was first...

hich of the following statements regarding the variables of the quoted market interest rate is not true? a . The default risk premium increases as the riskiness of issuers increases. b . The...

1. Your organization is about to complete a project to raise money for an important charity. Assume that there are 1,000 people in your organization. Also, assume that you had six months to raise as...

Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions (A, B, C, D, and E) in the instruction set. P1 has a clock rate of 4 GHz, and P2...

The following table shows data for further benchmarks. Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI...

The table below shows instruction-type breakdown for different programs. Using this data, you will be exploring the performance trade-offs for different changes made to an MIPS processor. Assuming...

Cash fow to the waman under an interest-only loan, in which Poryi will pay the annual intecest expense each year and pay the principal back at the end of the contract What is the amount of payment...

A project will produce operating cash flows of $45,000 a year for four years. During the life of the project, inventory will be lowered by $30,000 and accounts receivable will increase by $15,000....

30. Consider the market for one-year discount bonds with standard face value of $1,000. Assume that supply and demand curves in this market are both linear, and Q$ = P 700, while demand curve passes...