All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Ask a Question
Search
Search
Sign In
Register
study help
computer science
computer organization design
Questions and Answers of
Computer Organization Design
Can we generate exception control signals in EX instead of in ID? Explain how this will work or why it will not work, using the “BNE R4,R5,Label” instruction and these pipeline stage latencies as
Given this breakdown of execution cycles in the processor with direct support for the ADDM instruction, what speedup is achieved by replacing this instruction with a 3-instruction sequence (LW, ADD,
For which kinds of instructions (if any) is this resource on the critical path?The remaining three problems in this exercise refer to the following logic block (resource) in the datapath:
For which MIPS instruction(s) are both of these signals set to 1?The remaining problems in this exercise refer to the following signals from Figure 4.48:Figure 4.48 a. b. Signal
If LD/ST address computation can overflow, can we delay overflow exception detection into the MEM stage? Use the given store instruction to explain what happens.The remaining three problems in this
What is the speedup of going from a 1-issue processor to a 2-issue processor from Figure 4.69? Use your code from 4.28.1 for both 1-issue and 2-issue, and assume that 1,000,000 iterations of the loop
For the ALU and the two add units, what are their data input values?The remaining problems in this exercise assume that data memory is all zeros and that the processor’s registers have the
What is the critical path for an MIPS load (LD) instruction?Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 4.2 there are
In what fraction of all cycles is the input of the sign-extend circuit needed? What is this circuit doing in cycles in which its input is not needed?For the remaining problems in this exercise,
What would be the additional speedup (relative to a processor with forwarding) if we added time-travel forwarding that eliminates all data hazards? Assume that the yet-to-be-invented time-travel
Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit?The remaining problems in this exercise assume that instructions executed by
Repeat 4.8.1, but now the fault to test for is whether the “Jump” control signal has this fault.Problem 4.8.1Let us assume that processor testing is done by filling the PC, registers, and data
Repeat 4.37.4, but now assume that we only want to support ADD instructions.Exercise 4.37.4Given these latencies for individual elements of the datapath, compare clock cycle times of the single-cycle
If we assume forwarding will be implemented when we design the hazard detection unit, but then we forget to actually implement forwarding, what are the final register values after this instruction
What is the speedup achieved by adding this improvement?When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance
We can eliminate the MemRead control signal and have the data memory be read in every cycle, i.e., we can permanently have MemRead=1. Explain why the processor still functions correctly after this
If there is no forwarding, what new inputs and output signals do we need for the hazard detection unit in Figure 4.60? Using this instruction sequence as an example, explain why each signal is
What is the speedup of using your code from 4.29.4 instead of the original code with a 2-issue static superscalar processor? Assume that the loop has many (e.g., 1,000,000) iterations.Exercise
For the datapath from Figure 4.24, draw the logic diagram for the part of the control unit that implements just the first signal. Assume that we only need to support LW, SW, BEQ, ADD, and J (jump)
What is the cost of your implementation from 4.3.2?Problem 4.3.2Show how this block can be implemented. Use only AND, OR, NOT, and D Flip-Flops.Cost and latency of digital logic depends on the kinds
Repeat 4.39.4, but this time the goal is to minimize energy spent per instruction while increasing the clock cycle time by no more than 10%.Exercise 4.39.4It is often possible to sacrifice some speed
Add NOP instructions to this code to eliminate hazards if there is ALU-ALU forwarding only (no forwarding from the MEM to the EX stage).In this exercise, we examine how data dependences affect
For the given code, what is the speedup achieved by moving branch execution into the ID stage? Explain your answer. In your speedup calculation, assume that the additional comparison in the ID stage
If you can speed up the generation of control signals, but the cost of the entire processor increases by $1 for each 5ps improvement of a single control signal, which control signals would you
Repeat 4.30.4, but for a 4-issue processor. What conclusion can you draw about the importance of good branch prediction when the issue width of the processor is increased?Exercise 4.30.4For a 2-issue
With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that
Given these pipeline stage latencies, repeat the speedup calculation from 4.14.3, taking into account the (possible) change in clock cycle time. Assume that the latency ID stage increases by 50% and
Assuming that there are many free registers available, rename the MIPS version of this loop to eliminate as many data dependences as possible between instructions in the same iteration of the loop.
What is the accuracy of your predictor from 4.24.4 if it is given a repeating pattern that is the exact opposite of this one?Exercise 4.24.4Design a predictor that would achieve a perfect accuracy if
If this instruction already exists in a legacy ISA, explain how it would be executed in a modern processor like AMD Barcelona.In this exercise, we examine how the ISA affects pipeline design.
Compare cost/performance ratios for the two circuits you designed in 4.5.1 and 4.5.2. For this problem, performance of a circuit is the inverse of the time needed to perform a 32-bit
What is the speedup of executing branches 1 stage earlier in a 4-issue processor?The remaining problems in this exercise assume the following pipeline depth and that the branch outcome is determined
For the remaining three problems in this exercise, unless the problem specifies otherwise, assume the following statistics about what percentage of instructions are branches, predictor accuracy, and
How often (as a percentage of all cycles) do we have a cycle in which all five pipeline stages are doing useful work?The remaining three problems in this exercise refer to the following loop. Assume
In vectored exception handling, the table of exception handler addresses is in data memory at a known (fixed) address. Change the pipeline to implement this exception handling mechanism. Repeat
Repeat 4.34.2 for your extended datapath from 4.34.4.Exercise 4.34.2Describe the requirements of forwarding and hazard detection units for your datapath from 4.34.1.What needs to be done to support
Repeat 4.35.4 for a processor that has two delay slots for every branch.Exercise 4.35.4ranslate this C loop into MIPS instructions, assuming that our ISA requires one delay slot for every branch. Try
We can convert all load/store instructions into register-based (no offset) and put the memory access in parallel with the ALU. What is the clock cycle time if this is done in the single-cycle and in
Repeat 4.36.5, but now assume that ADDM was supported by adding a pipeline stage. When ADDM is translated, this extra stage can be removed and, as a result, half of the existing data stalls are
Assuming that each Mux has a latency of 40ps, determine how much time does the control unit have to generate the lush signals? Which signal is the most critical?The remaining three problems in this
One of these signals goes back through the pipeline. Which signal is it? Is this a time-travel paradox? Explain.The remaining problems in this exercise refer to the following signals from Figure
Assuming that we only support BEQ and ADD instructions, discuss how changes in the given latency of this resource affect the cycle time of the processor. Assume that the latencies of other resources
For debugging, it is useful to be able to detect when a particular value is written to a particular memory address. We want to add two new registers, WADDR and WVAL. The processor should trigger an
Repeat 4.28.5, but this time assume that in the 2-issue processor one of the instructions to be executed in a cycle can be of any kind, and the other must be a non-memory instruction.Exercise
What are the values of all inputs for the “Registers” unit?The remaining problems in this exercise assume that data memory is all zeros and that the processor’s registers have the following
If we can improve the latency of one of the given datapath components by 10%, which component should it be? What is the speedup from this improvement?For the remaining problems in this exercise,
What is the critical path for an MIPS BEQ instruction?Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 4.2 there are seven
Repeat 4.19.3 but this time determine which of the two options results in shorter time per instruction.Problem 4.19.3Let us assume that we cannot afford to have three-input Muxes that are needed for
Instead of a single-cycle organization, we can use a multicycle organization where each instruction takes multiple cycles but one instruction inishes before another is fetched. In this organization,
Using a single test described in 4.8.1, we can test for faults in several different signals, but typically not all of them. Describe a series of tests to look for this fault in all Mux outputs (every
If it costs $1 to reduce the latency of a single component of the datapath by 1ps, what would it cost to reduce the clock cycle time by 20% in the single-cycle and in the pipelined design?The
For the design described in 4.20.5, add NOPs to this instruction sequence to ensure correct execution in spite of missing support for forwarding.Problems 4.20.5If we assume forwarding will be
Compare the cost/performance ratio with and without this improvement.When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the
If an idle unit spends 10% of the power it would spend if it were active, what is the energy spent by the instruction memory in each cycle? What percentage of the overall energy spent by the
For the new hazard detection unit from 4.21.5, specify which output signals it asserts in each of the first five cycles during the execution of this code.Problem 4.21.5If there is no forwarding, what
What is the speedup of using your code from 4.29.4 instead of the original code with a pipelined (1-issue) processor? Assume that the loop has many (e.g., 1,000,000) iterations.Exercise 4.29.4Unroll
Change your design to minimize the latency, then to minimize the cost. Compare the cost and latency of these two optimized designs.Cost and latency of digital logic depends on the kinds of basic
Repeat 4.39.5, but now assume that energy consumption is reduced by a factor of X2 when latency is made X times longer. What are the power savings compared to what you computed for 4.39.2?Exercise
Repeat 4.9.5, but now implement both of these signals.Problems 4.9.5For the datapath from Figure 4.24, draw the logic diagram for the part of the control unit that implements just the first signal.
Using the first branch instruction in the given code as an example, describe the forwarding support that must be added to support branch execution in the ID stage. Compare the complexity of this new
What is the total execution time of this instruction sequence with only ALU-ALU forwarding? What is the speedup over a no-forwarding pipeline?In this exercise, we examine how data dependences affect
What fraction of the cost was saved in your circuit from 4.4.3 by implementing these two control signals together instead of separately?Problem 44.3When multiple logic expressions are implemented, it
Repeat 4.30.5, but now assume that the 4-issue processor has 50 pipeline stages. Assume that each of the original 5 stages is broken into 10 new stages, and that branches are executed in the first of
If the processor is already too expensive, instead of paying to speed it up as we did in 4.10.5, we want to minimize its cost without further slowing it down. If you can use slower logic to implement
Assuming stall-on-branch and no delay slots, what is the new clock cycle time and execution time of this instruction sequence if BEQ address computation is moved to the MEM stage? What is the speedup
Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly,
Which cache design is better for each of these benchmarks? Use data to support your conclusion.Both Barcelona and Nehalem are chip multiprocessors (CMPs), having multiple cores and their caches on a
For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the
The change in 4.17.5 requires many existing LW/SW instructions to be converted into two-instruction sequences. If this is needed for 50% of these instructions, what is the overall speedup achieved by
How many iterations of your loop from 4.35.4 can be “in light” within this processor’s pipeline? We say that an iteration is “in light” when at least one of its instructions has been
What is the best page size if using a modern disk with a 3 ms latency and 100 MB/s transfer rate? Explain why future servers are likely to have larger pages.For a high-performance system such as a
List the possible values of the given cache block for a correct cache coherence protocol implementation. List at least one more possible value of the block if the protocol doesn’t ensure cache
What is the cache line size (in words)?For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.
Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2
Assuming that the L1 hit time determines the cycle times for P1 and P2, what are their respective clock rates?In this exercise, we will look at the different ways capacity affects overall
Assume a 64 KB direct-mapped cache with a 32-byte line. What is the miss rate for the address stream above? How is this miss rate sensitive to the size of the cache or the working set? How would you
Using the references from Exercise 5.3, show the final cache contents for a three-way set associative cache with two-word blocks and a total size of 24 words. Use LRU replacement. For each reference
Assuming both client and server are involved in the process, first name the client and server systems. Where can caches be placed to speed up the process?In this exercise we consider memory
What is the best page size if entries now become 128 bytes?For a high-performance system such as a B-tree index for a database, the page size is determined mainly by the data size and disk
Given the address stream in the table, and the initial TLB and page table states shown above, show the final state of the system. Also list for each reference if it is a hit in the TLB, a hit in the
How many 32-bit integers can be stored in a 16-byte cache line?In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within
For a single-level page table, how many page table entries (PTEs) are needed? How much physical memory is needed for storing the page table?In this exercise, we will examine space/time optimizations
Assuming an LRU replacement policy, how many hits does this address sequence exhibit?In this exercise, we will examine how replacement policies impact miss rate. Assume a 2-way set associative cache
What would happen for the given operation sequence for shadow page table and nested page table, respectively?To support multiple virtual machines, two levels of memory virtualization are needed. Each
What should happen if the processor issues a request that hits in the cache while a block is being written back to main memory from the write buffer?In this exercise, we will explore the control unit
For a snooping protocol, list a valid operation sequence on each processor/cache to inish the above read/write operations.Cache coherence concerns the views of multiple processors on a given cache
Shared cache latency increases with the CMP size. Choose the best design if the shared cache latency doubles. Off-chip bandwidth becomes the bottleneck as the number of CMP cores increases. Choose
For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit
How many entries does the cache have?For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.
Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.Recall that we have two write policies and write allocation
Re-compute the miss rate when the cache line size is 16 bytes, 64 bytes, and 128 bytes. What kind of locality is this workload exploiting?Media applications that play audio or video files are part of
Using the references from Exercise 5.3, show the final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify
What is the AMAT for P1 and P2? In this exercise, we will look at the different ways capacity affects overall performance. In general, cache access time is proportional to capacity. Assume that main
Design a memory hierarchy for the system. Show the typical size and latency at various levels of the hierarchy. What is the relationship between cache size and its access latency?In this exercise we
Based on 5.9.1, what is the best page size if pages are half full?Exercise 5.9.1What is the best page size if entries now become 128 bytes?For a high-performance system such as a B-tree index for a
References to which variables exhibit temporal locality?In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the
Repeat Exercise 5.10.1, but this time use 16 KB pages instead of 4 KB pages. What would be some of the advantages of having a larger page size? What are some of the disadvantages?Exercise 5.10.1Given
Using a multilevel page table can reduce the physical memory consumption of page tables, by only keeping active PTEs in physical memory. How many levels of page tables will be needed in this case?
Assuming an MRU (most recently used) replacement policy, how many hits does this address sequence exhibit?In this exercise, we will examine how replacement policies impact miss rate. Assume a 2-way
Assuming an x86-based 4-level page table in both guest and nested page table, how many memory references are needed to service a TLB miss for native vs. nested page table?To support multiple virtual
Showing 100 - 200
of 1060
1
2
3
4
5
6
7
8
9
10
11