Question: 5. Starting Some Static (Scheduling) (20 points): Consider the 2-way superscalar processor we covered in class - a five stage pipeline where we can issue

5. Starting Some Static (Scheduling) (20 points): Consider the 2-way superscalar

processor we covered in class - a five stage pipeline where we

5. Starting Some Static (Scheduling) (20 points): Consider the 2-way superscalar processor we covered in class - a five stage pipeline where we can issue one ALU or branch instruction along with one load or store instruction every cycle. Suppose that the branch delay penalty is two cycles and that we handle control hazards with branch delay slots (since the penalty is two cycles, and this is a 2-way superscalar processor, that would be four instructions that we need to place in delay slots). This processor has full forwarding hardware. This processor is a VLIW machine. How long would the following code take to execute on this processor assuming the loop is executed 200 times? Assume the pipeline is initially empty and give the time taken up until the completed execution of the instruction sequence shown here. First you will need to schedule (i.e. reorder) the code (use the table below) to reduce the total number of cycles required (but don't unroll it...yet). Total # of cycles for 200 iterations: (Hint - schedule the code first for one iteration, then figure out how long it will take the processor to run 200 iterations of this scheduled code) Loop: lw $t0, O (SO) lw $t1, 0 (Sto) add $t1,$s1, $t1 sw $t1, 0 ($t0) # you may assume that this store never goes to the same address as the first load addi $so, $s0, 4 bne $so, $s2, Loop Cycle 1st Issue Slot (ALU or Branch) 2nd Issue Slot (LW or SW) 1 2 3 4 5 6 7 8 9 10 12 13 Now unroll the loop once to make two copies of the loop body. Schedule it again and record the total # of cycles for 200 iterations: Cycle 1st Issue Slot (ALU or Branch) 2nd Issue Slot (LW or SW) 1 2 3 4 5 6 7 8 9 10 11 12 13 5. Starting Some Static (Scheduling) (20 points): Consider the 2-way superscalar processor we covered in class - a five stage pipeline where we can issue one ALU or branch instruction along with one load or store instruction every cycle. Suppose that the branch delay penalty is two cycles and that we handle control hazards with branch delay slots (since the penalty is two cycles, and this is a 2-way superscalar processor, that would be four instructions that we need to place in delay slots). This processor has full forwarding hardware. This processor is a VLIW machine. How long would the following code take to execute on this processor assuming the loop is executed 200 times? Assume the pipeline is initially empty and give the time taken up until the completed execution of the instruction sequence shown here. First you will need to schedule (i.e. reorder) the code (use the table below) to reduce the total number of cycles required (but don't unroll it...yet). Total # of cycles for 200 iterations: (Hint - schedule the code first for one iteration, then figure out how long it will take the processor to run 200 iterations of this scheduled code) Loop: lw $t0, O (SO) lw $t1, 0 (Sto) add $t1,$s1, $t1 sw $t1, 0 ($t0) # you may assume that this store never goes to the same address as the first load addi $so, $s0, 4 bne $so, $s2, Loop Cycle 1st Issue Slot (ALU or Branch) 2nd Issue Slot (LW or SW) 1 2 3 4 5 6 7 8 9 10 12 13 Now unroll the loop once to make two copies of the loop body. Schedule it again and record the total # of cycles for 200 iterations: Cycle 1st Issue Slot (ALU or Branch) 2nd Issue Slot (LW or SW) 1 2 3 4 5 6 7 8 9 10 11 12 13

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Provide a summary technical report with your own words about Pipelined Execution which is also named as Instruction Level Parallelism, addressing mainly the following areas: 1. What is Pipelined...

Provide a summary technical report with about Pipelined Execution which is also named as Instruction Level Parallelism, addressing mainly the following areas: 1. What is Pipelined Execution and its...

ADVANCED COMPUTER ARCHITECTURE CPCS504 Assignment1 Spring 2021 /1442 Due Date 1st March 2021 Chapter1: 1. In Example 1 of Section 1.2.1, we assumed that the cache miss penalty was 20 cycles. With...

(i) Define clock skew and clock drift. [2 marks] (ii) A client running Cristian's Algorithm observes a local clock time of 1399157100.00s at the start of its RPC, and 1399157100.10s at the end of its...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Suppose that there are three states of the world, a, b, and c. The probabilities of the three states are 1 = 0.25, 2 = 0.5, and 3 = 0.25. Let A, B, and C denote the Arrow-Debreu securities that pay...

ANSI-SPARC6 Programming Language Compilation Write notes on each of the following topics: (a) the implementation of labels and jumps in a recursive, block structured programming language [7 marks]...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

Have a C compiler which is ANSI conforming in all respects except that it has no facility for the definition, declaration or use of standard C structures. Outline a set of routines written in this...

For monotone functions f, f0: P Q between posets (P, vP ) and (Q, vQ), let f v f(i) Prove that the binary relation v is a partial order. [3 marks] (ii) For monotone functions between posets p : P 0...

As shown in Table 1, the boat manufacturer has provided an estimate of the average speed of each unit and the fuel consumption based on this average speed. Since the boats deliver packages over...

Freeport Corporation finds that demand for surfboards has average demand of 10 units per day, with a standard deviation of 3 units. Lead time from the supplier averages 12 days, with a standard...

13 Heidi makes her favourite colour paint by mixing blue, yellow and green in the ratio 0.8: 1.1:0.1. Copy and complete the table to show how much of each colour she needs to make the quantities...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

1. Integrated talent management is the new buzzword. Rather than considering each aspect of creating a talent pipeline as a separate process, companies are beginning to integrate all processes to...

2. Global leadership development is an important business activity. The globalization of business means that employees at all levels must be up- skilled to work effectively with colleagues around the...

3. Ongoing skill gaps are being identified between what is needed to perform todays jobs and the lack of qualified people to fill them. Nearly half of the executives of U.S. firms cite skill gaps as...