Consider the following computation loop Xi+1 = aXi by which is the inner loop in a...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Consider the following computation loop Xi+1 = aXi by which is the inner loop in a numerical algorithmic process. For numerical convergence, this loop is supposed to run for a large number of iterations. The constants a and b are initialized in float registers. Loop: LD MULTF LD MULTF SUBF SD SUBI SUBI BNEQZ FO,0 (R1) FO, FO, F2 F4,0 (R2) F4, F4, F6 FO, FO, F4 0 (R1), FO R1, R1,8 R2, R2,8 R1, Loop - WRITE Instruction FP ALU OP FP ALU OP Load double Load double // Load // Multiply // Load // Multiply // Subtract // Store // Decrement // Decrement // Branch not equal Assume the pipeline data dependent latencies between instructions (WRITE to READ operands) are given by the following Table. For example, LD FO,0 (R1) and MULTF FO,FO,F2 have a RAW dependency with latency (load slot) 1 slot. Also, the machine uses float arithmetic units (ADDF and MULTF) which are pipelined and embedded in the instruction pipeline. You may consider 4-stage float pipes. Furthermore, delayed branching is available. READ Instruction 1 Another FP ALU OP store double FP ALU OP Store double Latency in Cycles 3 2 1 0 a) Unroll the above loop as many times as necessary to schedule it without any delays, collapsing the loop overhead instructions. b) Briefly show the scheduling of the entire loop iterations. c) Comment if there are other unrollings feasible. Consider the following computation loop Xi+1 = aXi by which is the inner loop in a numerical algorithmic process. For numerical convergence, this loop is supposed to run for a large number of iterations. The constants a and b are initialized in float registers. Loop: LD MULTF LD MULTF SUBF SD SUBI SUBI BNEQZ FO,0 (R1) FO, FO, F2 F4,0 (R2) F4, F4, F6 FO, FO, F4 0 (R1), FO R1, R1,8 R2, R2,8 R1, Loop - WRITE Instruction FP ALU OP FP ALU OP Load double Load double // Load // Multiply // Load // Multiply // Subtract // Store // Decrement // Decrement // Branch not equal Assume the pipeline data dependent latencies between instructions (WRITE to READ operands) are given by the following Table. For example, LD FO,0 (R1) and MULTF FO,FO,F2 have a RAW dependency with latency (load slot) 1 slot. Also, the machine uses float arithmetic units (ADDF and MULTF) which are pipelined and embedded in the instruction pipeline. You may consider 4-stage float pipes. Furthermore, delayed branching is available. READ Instruction 1 Another FP ALU OP store double FP ALU OP Store double Latency in Cycles 3 2 1 0 a) Unroll the above loop as many times as necessary to schedule it without any delays, collapsing the loop overhead instructions. b) Briefly show the scheduling of the entire loop iterations. c) Comment if there are other unrollings feasible.
Expert Answer:
Answer rating: 100% (QA)
a Unrolling the loop Load initial values of Xi and Y LD F0 0R1 LD F4 0R2 Unroll the loop 2 time... View the full answer
Related Book For
Computer Architecture A Quantitative Approach
ISBN: 978-0123704900
4th edition
Authors: John L. Hennessy, David A. Patterson
Posted Date:
Students also viewed these programming questions
-
This part of our case study will focus on the amount of instruction-level parallelism available to the run time hardware scheduler under the most favorable execution scenarios (the ideal case)....
-
A 0.1 cm thick flat copper plate, 2.5 m x 2.5 m square is to be cooled in a vertical position. The initial temperature of the plate is 90?C with the ambient fluid at 30?C. The fluid medium is either...
-
What are some of the things organizations and leaders can do to reduce diversity bias faced by minorities and women in the workplace?
-
In two or three paragraphs, explain how materiality may differ from company to company in regards to an audit. Would materiality have the same impact on a large public company, versus a small,...
-
What is the purpose of practice aids in forensic and valuation services?
-
On January 5, 2012, Phelps Corporation received a charter granting the right to issue 5,000 shares of $100 par value, 8% cumulative and nonparticipating preferred stock, and 50,000 shares of $10 par...
-
Discuss each of the following terms: (a) data (b) database (c) database management system (d) database application program (e) data independence (f) security (g) integrity (h) views 2. What is...
-
Skylar and Walter Black have been married for 25 years. They live at 883 Scrub Brush Street, Apt. 52B, Las Vegas, NV 89125. Skylar is a stay-at-home parent and Walt is a high school teacher. His W-2...
-
For CMCSA Comcast Corporation, identify from the financial statements the most important concepts, revenues, costs, assets or liabilities, from at least 5 concepts, identify the major variances from...
-
I have to create hospital employee tracking system for the SVV hospitals. You will create a Java project to automate the adding, deleting and displaying the list of UAB hospital employees. The set of...
-
I need help with making these following adjustments to my program: Add a user-defined exception that can be thrown by one of the methods as part of the validation or error checking. The main method...
-
Mr. Semi is a new business owner who has recently secured four projects (X, Y, Z, and W). Each project will take 15 days to complete. However, one of the major challenges he faces is time management...
-
A hand truck is used to move two kegs, each of mass 40 kg. Neglecting the mass of the hand truck, determine (a) the vertical force P that should be applied to the handle to maintain equilibrium when ...
-
Can not figure out what is wrong with my Java code private char suit; private int value; //+PlayingCard (s;char, v;int) public playingCard(char s, int v){ suit = s; value = v; } //get suit public...
-
The neutral axis of the composite beam is not in the central of beam In the middle of beam In the edge of beam In the central of beam O O O
-
The company manufactures three products: wooden chairs, tables and dressers. AFC started off as a 'Mom & Pop' shop but has grown rapidly. AFC uses one assembly line to build all three products,...
-
Instead imagine that the storage system is configured to contain two 40 GB disks in a RAID 1 array; that is, the data is mirrored across the two disks. Use queuing theory to model this system for a...
-
One can uncover the pattern size with the following code. The code accesses the raw device to avoid file system optimizations. The key to all of the Shear algorithms is to use random requests to...
-
You are trying to figure out whether to build a new fabrication facility for your IBM Power5 chips. It costs $1 billion to build a new fabrication facility. The benefit of the new fabrication is that...
-
Lynn Goldsmith is a photographer known for her photographs of famous musicians. In 1981, Goldsmith had a photography session with the singer Prince. Three years later, Vanity Fair obtained a license...
-
Stone Brewing Co. is a San Diego brewer that has sold its beers for over two decades. Stone has maintained its trademark and brand from the beginning, registering the STONE mark in 1998. Stone has...
-
The Nielsen family formed their corporation, N. Robert Nielsen, Inc., to conduct farming operations. Morre, Grider & Co. is a certified public accounting firm that has provided accounting, tax, and...
Study smarter with the SolutionInn App