All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Ask a Question
Search
Search
Sign In
Register
study help
computer science
computer organization design
Questions and Answers of
Computer Organization Design
I/O accesses often have a large impact on overall system performance. Calculate the CPI of a machine using the performance characteristics above, assuming a non-virtualized system. Calculate the CPI
What should happen if the processor issues a request that misses in the cache while a block is being written back to main memory from the write buffer?In this exercise, we will explore the control
What are the best-case and worst-case numbers of cache misses needed to execute the listed read/write instructions?Cache coherence concerns the views of multiple processors on a given cache block.
Discuss the pros and cons of shared vs. private L2 caches for both single-threaded, multi-threaded, and multiprogrammed workloads, and reconsider them if having on-chip L3 caches.Both Barcelona and
You are asked to optimize a cache design for the given references. There are three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has 1-word blocks, C2 has 2-word
What is the ratio between total bits required for such a cache implementation over the data storage bits?For a direct-mapped cache design with a 32-bit address, the following bits of the address are
For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and
“Prefetching” is a technique that leverages predictable address patterns to speculatively bring in additional cache lines when a particular cache line is accessed. One example of prefetching is a
Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 and P2? Which processor is faster?In this exercise, we will look at the different ways capacity affects overall
Using the references from Exercise 5.3, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU
What are the units of data transfers between hierarchies? What is the relationship between the data location, data size, and transfer latency?In this exercise we consider memory hierarchies for
Show the final contents of the TLB if it is 2-way set associative. Also show the contents of the TLB if it is direct mapped. Discuss the importance of having a TLB to high performance. How would
References to which variables exhibit spatial locality?In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same
An inverted page table can be used to further optimize space and time. How many PTEs are needed to store the page table? Assuming a hash table implementation, what are the common case and worst case
Simulate a random replacement policy by lipping a coin. For example, “heads” means to evict the first block in a set and “tails” means to evict the second block in a set. How many hits does
Among TLB miss rate, TLB miss latency, page fault rate, and page fault handler latency, which metrics are more important for shadow page table? Which are important for nested page table?To support
Compare and contrast the ideas of virtual memory and virtual machines. How do the goals of each compare? What are the pros and cons of each? List a few cases where virtual memory is desired, and a
Design a Finite state machine to enable the use of a write buffer.In this exercise, we will explore the control unit for a cache controller for a processor with a write buffer. Use the inite state
List the possible values of C and D for an implementation that ensures both consistency assumptions on page 538.Memory consistency concerns the views of multiple data items. The following table shows
Assume both benchmarks have a base CPI of 1 (ideal L2 cache). If having non-blocking cache improves the average number of concurrent L2 misses from 1 to 2, how much performance improvement does this
Calculate the total number of bits required for the cache listed in the table, assuming a 32-bit address. Given that total size, find the total size of the closest direct-mapped cache with 16-word
How many blocks are replaced?Starting from power on, the following byte-addressed cache references are recorded. Address O 4 16 132 232 160 1024 30 140 3100 180 2180
For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?Consider the following program and cache behaviors.
For 64 KB data caches with varying set associativities, what are the miss rates broken down by miss types (cold, capacity, and conflict misses) for each benchmark?For the problems below, use data
What is the optimal block size for a miss latency of 20×B cycles?Cache block size (B) can affect both miss rate and miss latency. Assuming a 1-CPI machine with an average of 1.35 references (both
What is the AMAT for P1 with the addition of an L2 cache? Is the AMAT better or worse with the L2 cache?For the next three problems, we will consider the addition of an L2 cache to P1 to presumably
Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight way set associative cache. How do these numbers
How many 16-byte cache lines are needed to store all 32-bit matrix elements being referenced?Locality is affected by both the reference order and data layout. The same computation can also be written
Repeat 4.24.4, but now your predictor should be able to eventually (after a warm-up period during which it can make wrong predictions) start perfectly predicting both this pattern and its opposite.
What is the speedup achieved by adding this new instruction? In your calculation, assume that the CPI of the original program (without the new instruction) is 1.The last problem in this exercise
In the rest of this exercise, we assume that the following basic digital logic elements are available, and that their latency and cost are as follows:The time given for a D-element is its setup time.
What is the speedup of executing branches 1 stage earlier in an 8-issue processor? Discuss the difference between this result and the result from 4.32.5.Exercise 4.32.5What is the speedup of
What should the branch prediction accuracy be if we are willing to have a speedup of 0.5 (one half) relative to the same processor with an ideal branch predictor?For the remaining three problems in
At the start of the cycle in which we fetch the first instruction of the third iteration of this loop, what is stored in the IF/ID register?The remaining three problems in this exercise refer to the
Repeat 4.34.3 for your extended datapath from 4.34.4.Exercise 4.34.3What needs to be done to support undeined instruction exceptions in your datapath from 4.34.1? Note that the undeined instruction
Calculate the CPI for the system listed above assuming that there are no accesses to I/O. What is the CPI if the VMM performance impact doubles? If it is cut in half? If a virtual machine software
Communication bandwidth and server processing bandwidth are two important factors to consider when designing a memory hierarchy. How can the bandwidths be improved? What is the cost of improving
What are the reuse time thresholds for these three technology generations?Keeping "frequently used" (or "hot") pages in DRAM can save disk accesses, but how do we determine the exact meaning of
Given the parameters in the table above, calculate the total page table size for a system running 5 applications that utilize half of the memory available.There are several parameters that impact the
Under what scenarios would entry 2’s valid bit be set to zero?The following table shows the contents of a 4-entry TLB. Entry-ID 1 2 3 4 Valid 1 0 1 1 VA
Which address should be evicted at each replacement to maximize the number of hits? How many hits does this address sequence exhibit if you follow this “optimal” policy?In this exercise, we will
Discusses virtualization under the assumption that the virtualized system is running the same ISA as the underlying hardware. However, one possible use of virtualization is to emulate non-native
For a benchmark with native execution CPI of 1, what are the CPI numbers if using shadow page tables vs. NPT (assuming only page table virtualization overhead)?The following table shows parameters
Assume new generations of processors double the number of cores every 18 months. To maintain the same level of per-core performance, how much more off-chip memory bandwidth is needed for a 2012
List at least one more possible pair of values for C and D if such assumptions are not maintained.Memory consistency concerns the views of multiple data items. The following table shows two
Generate a series of read requests that have a lower miss rate on a 2 KB 2-way set associative cache than the cache listed in the table. Identify one possible solution that would make the cache
What is the hit ratio?Starting from power on, the following byte-addressed cache references are recorded. Address O 4 16 132 232 160 1024 30 140 3100 180 2180
Select the set associativity to be used by a 64 KB L1 data cache shared by both benchmarks. If the L1 cache has to be directly mapped, select the set associativity for the 1 MB L2 cache.For the
For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?Consider the following program and
What is the optimal block size for a miss latency of 24+B cycles?Cache block size (B) can affect both miss rate and miss latency. Assuming a 1-CPI machine with an average of 1.35 references (both
Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 with the addition of an L2 cache?For the next three problems, we will consider the addition of an L2 cache to P1 to
It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50
What are the reuse time thresholds if we keep using the same 4K page size? What’s the trend here?Keeping "frequently used" (or "hot") pages in DRAM can save disk accesses, but how do we determine
Now consider multiple clients simultaneously accessing the server. Will such scenarios improve the spatial and temporal locality?In this exercise we consider memory hierarchies for various
Given the parameters in the table above, calculate the total page table size for a system running 5 applications that utilize half of the memory available, given a two level page table approach with
References to which variables exhibit temporal locality?Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from
What happens when an instruction writes to VA page 30? When would a software managed TLB be faster than a hardware managed TLB?The following table shows the contents of a 4-entry TLB.
Describe why it is difficult to implement a cache replacement policy that is optimal for all address sequences.In this exercise, we will examine how replacement policies impact miss rate. Assume a
What techniques can be used to reduce page table shadowing induced overhead?The following table shows parameters for a shadow paging system. TLB Misses per 1000 Instructions 0.2 NPT TLB
For various combinations of write policies and write allocation policies, which combinations make the protocol implementation simpler?Memory consistency concerns the views of multiple data items. The
Consider the entire memory hierarchy. What kinds of optimizations can improve the number of concurrent misses?Both Barcelona and Nehalem are chip multiprocessors (CMPs), having multiple cores and
What is inherently different between these two classes of workload when run on these multi-core systems?Benchmarking is field of study that involves identifying representative workloads to run on
Of the peripherals listed in the table, which could cause coherency problems with cache contents? What criteria determine if coherency issues must be addressed?Direct Memory Access (DMA) allows
Calculate the MTBF for each of the devices in the table.Mean Time Between Failures (MTBF), Mean Time To Replacement (MTTR), and Mean Time To Failure (MTTF) are useful metrics for evaluating the
Outline how an interrupt from each of the devices listed in the table would be handled.Section 6.6 defines an eight-step process for handling interrupts. The Cause and Status registers together
Given that your company operates a global search engine with a large disk farm, does upgrading to either RAID 0 or RAID 1 make economic sense given that your income model is based on the number of
For the devices listed in the table, identify I/O interfaces and classify them in terms of their behavior and partner.Figure 6.2 describes numerous I/O devices in terms of their behavior, partner,
Configure the Sun Fire x4150 to provide 10 terabytes of storage for a processor array of 1000 processors running bioinformatics simulations. Your configuration should minimize power consumption while
Given only the original problem parameters, would you recommend upgrading to either RAID 0 or RAID 1 assuming individual disk parameters remain the same in the previous table?For disks in the table
Calculate the average time to read or write a 1024-byte sector for each FLASH memory listed in the table.Explore the nature of FLASH memory by answering the questions related to performance for FLASH
Give an example in the miss rate table where higher set associativity actually increases miss rate. Construct a cache configuration and reference stream to demonstrate this.For the problems below,
List the inal state of the cache, with each valid entry represented as a record of .Starting from power on, the following byte-addressed cache references are recorded. Address O 4 16 132 232 160 1024
The formula shown on page 457 shows the typical method to index a direct-mapped cache, specifically (Block address) modulo (Number of blocks in the cache). Assuming a 32-bit address and 1024 blocks
For constant miss latency, what is the optimal block size?Cache block size (B) can affect both miss rate and miss latency. Assuming a 1-CPI machine with an average of 1.35 references (both
What are the minimal bandwidths needed to achieve the performance of CPI=1.5?Consider the following program and cache behaviors. a. b. Data Reads per 1000 Instructions 250 200 Data Writes per 1000
Which processor is faster, now that P1 has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache to match P1’s performance? If P2 is faster, what miss rate would P1 need in its
In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the irst level cache. While this allowed
Give an example of where the cache can provide out-of-date data. How should the cache be designed to mitigate or avoid such issues?In this exercise we consider memory hierarchies for various
What other factors can be changed to keep using the same page size (thus avoiding software rewrite)? Discuss their likeliness with current technology and cost trends.Keeping "frequently used" (or
References to which variables exhibit spatial locality?Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C
A cache designer wants to increase the size of a 4 KB virtually indexed, physically tagged cache. Given the page size listed in the table above, is it possible to make a 16 KB direct-mapped cache,
What happens when an instruction writes to VA page 200?The following table shows the contents of a 4-entry TLB. Entry-ID 1 2 3 4 Valid 1 0 1 1 VA
Assume you could make a decision upon each memory reference whether or not you want the requested address to be cached. What impact could this have on miss rate?In this exercise, we will examine how
What techniques can be used to reduce NPT induced overhead?The following table shows parameters for a shadow paging system. TLB Misses per 1000 Instructions 0.2 NPT TLB Miss Latency 200 cycles Page
Calculate annual failure rate (AFR) for disks in the table.Measurements and statistics provided by storage vendors must be carefully interpreted to gain meaningful predictions about their system
As we move towards solid state drives constructed from FLASH memory, what will change about disk read times assuming that the data transfer rate stays constant?FLASH memory is one of the first true
What would be the most appropriate bus type (synchronous or asynchronous) for handling communications between a CPU and the peripherals listed in the table?I/O can be performed either synchronously
Select an appropriate bus (FireWire, USB, PCI, or SATA) for the peripherals listed in the table. Explain why the bus selected is appropriate.Among the most common bus types used in practice today are
Describe device polling. Would each application in the table be appropriate for communication using polling techniques? Explain.Communicating with I/O devices is achieved using combinations of
When an interrupt is detected, the Status register is saved and all but the highest priority interrupt is disabled. Why are low-priority interrupts disabled? Why is the status register saved prior to
For the applications listed in the table, outline a design for commands implementing command driven communication. Identify commands and their interaction with the device.Communicating with I/O
For each application in the table, does I/O performance dominate system performance?Metrics for I/O performance may vary dramatically from application to application. Where the number of transactions
For each application in the table, define characteristics that a set of benchmarks should exhibit when evaluating an I/O subsystem.Benchmarks play an important role in evaluating and selecting
Calculate the new RAID 3 parity value P’ for data in lines a and b in the table.RAID 3, RAID 4, and RAID 5 all use parity system to protect blocks of data. Specifically, a parity block is
RAID 0 uses striping to force parallel access among many disks. Why does striping improve disk performance? For each of the activities listed in the table, will striping help better achieve their
Find the maximum sustained I/O rate for random reads and writes. Ignore disk conflicts and assume the RAID controller is not the bottleneck. Follow the same approach as outlined in Section 6.10
Calculate the average time to read or write a 1024-byte sector for each disk listed in the table.Average and minimum times for reading and writing to storage devices are common measurements used to
For the application listed above, identify runtime characteristics for an operational system. Choose characteristics that will support evaluation similar to that performed for Exercise 6.16.Data from
For each application, would decreasing the sector size during reads and writes improve performance? Explain your answer.Ultimately, storage system design requires consideration of usage scenarios as
For each application, would increasing disk rotation speed improve performance? Explain your answer.Ultimately, storage system design requires consideration of usage scenarios as well as disk
Showing 200 - 300
of 1060
1
2
3
4
5
6
7
8
9
10
11