Questions and Answers of Computer Organization Design

Would each application benefit from a solid state FLASH drive given that cost is a design factor?FLASH memory is one of the first true competitors for traditional disk drives. Explore the
Calculate the minimum time to read or write a 512-byte sector for each FLASH memory listed in the table.Explore the nature of FLASH memory by answering the questions related to performance for FLASH
What problems would long, synchronous busses cause for connections between a CPU and the peripherals listed in the table?I/O can be performed either synchronously or asynchronously. Explore the
Assume that annual failure rate varies over the lifetime of disks in the previous table. Speciically, assume that AFR is three times as high in the 1st month of operation and doubles every year
Use online or library resources and summarize the communication structure for each bus type. Identify what the bus controller does and where the control physically is.Among the most common bus types
Describe interrupt driven communication. For each application in the table, if polling is inappropriate, explain how interrupt driven techniques could be used.Communicating with I/O devices is
For the interfaces identified in the previous problem, estimate their data rate.Figure 6.2 describes numerous I/O devices in terms of their behavior, partner, and data rate. However, these
Prioritize interrupts from the devices listed in each table row.Section 6.6 defines an eight-step process for handling interrupts. The Cause and Status registers together provide information on the
Recommend a backup and data archiving system for the disk array from 6.20.1. Compare and contrast disk, tape, and online backup capabilities. Use Internet and library resources to identify potential
Of the peripherals listed in the table, which would benefit from DMA? What criteria determine if DMA is appropriate?Direct Memory Access (DMA) allows devices to access memory directly rather than
For each application in the table, is I/O performance best measured using raw data throughput?Metrics for I/O performance may vary dramatically from application to application. Where the number of
Using online or library resources, identify a set of standard benchmarks for applications in the table. Why do standard benchmarks help?Benchmarks play an important role in evaluating and selecting
Calculate the new RAID 4 parity value P’ for data in lines a and b in the table.RAID 3, RAID 4, and RAID 5 all use parity system to protect blocks of data. Specifically, a parity block is
RAID 1 mirrors data among several disks. Assuming that inexpensive disks have lower MTBF than expensive disks, how can redundancy using inexpensive disks result in a system with lower MTBF? Use the
Assume we are configuring a Sun Fire x4150 server as described in Section 6.10. Determine if a configuration of 8 disks presents an I/O bottleneck. Repeat for configurations of 16, 4, and 2 disks.The
Calculate the availability for each of the devices in the table.Mean Time Between Failures (MTBF), Mean Time To Replacement (MTTR), and Mean Time To Failure (MTTF) are useful metrics for evaluating
Calculate the minimum time to read or write a 2048-byte sector for each disk listed in the table.Average and minimum times for reading and writing to storage devices are common measurements used to
For the application listed above, find a server available in the marketplace that you feel would be appropriate for running the application. Before evaluating the server, identify reasons why it was
Figure 6.6 shows that FLASH memory read and write access times increase as FLASH memory gets larger. Is this unexpected? What factors cause this?Explore the nature of FLASH memory by answering the
For each application, would increasing disk rotation speed improve system performance given that MTTF is decreased? Explain your answer.Ultimately, storage system design requires consideration of
Would each application be inappropriate for a solid state FLASH drive given that cost is NOT a design factor?FLASH memory is one of the first true competitors for traditional disk drives. Explore the
What problems would asynchronous busses cause for connections between a CPU and the peripherals listed in the table?I/O can be performed either synchronously or asynchronously. Explore the
Assume that disks with lower failure rates are more expensive. Specifically, disks are available at a higher cost that will start doubling their failure rate in year 8 rather than year 5. How much
Outline limitations of each of the bus types. Explain why those limitations must be taken into consideration when using the bus.Among the most common bus types used in practice today are FireWire
For the applications listed in the table, outline a design for memory mapped communication. Identify reserved memory locations and outline their contents.Communicating with I/O devices is achieved
For the interfaces identified in the previous problem, determine whether data rate or operation rate is the best performance measurement.Data from in previous problemFigure 6.2 describes numerous I/O
For each application in the table, is I/O performance best measured using the number of transactions processed?Metrics for I/O performance may vary dramatically from application to application. Where
Does it make sense to evaluate an I/O subsystem outside the larger system it is a part of? How about evaluating a CPU?Benchmarks play an important role in evaluating and selecting peripheral devices.
Is RAID 3 or RAID 4 more efficient? Are there reasons why RAID 3 would be preferable to RAID 4?RAID 3, RAID 4, and RAID 5 all use parity system to protect blocks of data. Specifically, a parity block
Like RAID 1, RAID 3 provides higher data availability. Explain the trade-off between RAID 1 and RAID 3. Would each of the applications listed in the table benefit from RAID 3 over RAID 1?RAID is
Determine if the PCI bus, DIMM, or the Front Side Bus presents an I/O bottleneck. Use the same parameters and assumptions used in Section 6.10.The emergence of web servers for ecommerce, online
Using metrics similar to those used in Chapter 6 and Exercise 6.16, assess the server you identified in 6.17.2 in comparison to the Sun Fire x4150 server evaluated in Exercise 6.16. Which would you
For each disk in the table, determine the dominant factor for performance. Specifically, if you could make an improvement to any aspect of the disk, what would you choose? If there is no dominant
What happens to availability as the MTTR approaches 0? Is this a realistic situation?Mean Time Between Failures (MTBF), Mean Time To Replacement (MTTR), and Mean Time To Failure (MTTF) are useful
What happens if the interrupt enable bit of the Cause register is not set when handling an interrupt? What value could the interrupt mask value take to accomplish the same thing?Section 6.6 defines
Describe what problems could occur when mixing DMA and virtual memory. Which of the peripherals in the table could introduce such problems? How can they be avoided?Direct Memory Access (DMA) allows
RAID 4 and RAID 5 use roughly the same mechanism to calculate and store parity for data blocks. How does RAID 5 differ from RAID 4 and for what applications would RAID 5 be more efficient?RAID 3,
Is there a relationship between the performance measures from the previous two problems and choosing whether to use polling or interrupt driven communication? What about the choice of using memory
Explain why real systems tend to use benchmarks or real applications to assess actual performance.The emergence of web servers for ecommerce, online storage, and communication has made disk servers
What happens to availability as the MTTR gets very high, i.e., a device is dificult to repair? Does this imply the device has low availability?Mean Time Between Failures (MTBF), Mean Time To
Most interrupt handling systems are implemented in the operating system. What hardware support could be added to make interrupt handling more efficient? Contrast your solution to potential hardware
Identify a standard benchmark set that would be useful for comparing the server you identified in 6.17.2 with the Sun Fire x4150.Exercise 6.17.2For the application listed above, find a server
Does it make sense to define I/O subsystems that use a combination of memory mapping and command driven communication? Explain your answer.Communicating with I/O devices is achieved using
RAID 4 and RAID 5 speed improvements grow with respect to RAID 3 as the size of the protected block grows. Why is this the case? Is there a situation where RAID 4 and RAID 5 would be no more
In some interrupt handling implementations, an interrupt causes an immediate jump to an interrupt vector. Instead of a Cause register where each interrupt sets a bit, each interrupt has its own
Develop an equation that computes how many links in the n-cube (where n is the order of the cube) can fail and we can still guarantee an unbroken link will exist to connect any node in the
Describe the scenario where none of philosophers ever eats (i.e., starvation). What is the sequence of events that happen that lead up to this problem?The dining philosopher's problem is a classic
What are all the possible resulting values of w, x, y, and z? For each possible outcome, explain how we might arrive at those values. You will need to examine all possible inter leavings of
For a 4 CPU MIMD machine, show the sequence of MIPS instructions that you would execute on each CPU. What is the speedup for this MIMD machine?We would like to execute the loop below as efficiently
Consider proposed implementations of a systolic array (you can find these in on the Internet or in technical publications). Then attempt to program the loop provided in Exercise 7.14 using this MISD
If we have P CPU in the system, with T nodes in the CCNUMA system, with each CPU having C memory blocks stored in it, and we maintain a byte of coherency information in each cache line, provide an
For a system that maintains coherency using cache-based block status, describe the inter-node traffic that will be generated as each of the 4 cores writes to a unique address, after which each
Describe how you will constructs warps for the SAXP loop to exploit the 8 cores provided in a single multiprocessor.Assume we want to execute the DAXP loop show on page 651 in MIPS assembly on the
Using the “template” SDK sample as a starting point, write a CUDA program to perform the following vector operations:Submit code for each program that demonstrates each operation and verifies the
Now consider which of these activities is already exploiting some form of parallelism (e.g., brushing multiple teeth at the same time, versus one at a time, carrying one book at a time to school,
If on average we need to access memory once every 75 cycles, what is impact on our application?On a CC-NUMA system, the cost of accessing non-local memory can limit our ability to utilize
Consider the following binary search algorithm (a classic divide and conquer algorithm) that searches for a value X in an sorted N-element array A and returns the index of matched entry:Assume that
Assume that you have Y cores on a multi-core processor to run MergeSort. Assuming that Y is much smaller than length(m), express the speedup factor you might expect to obtain for values of Y and
How many cycles does it take for all instructions in a single iteration of the above loop to execute?Consider the following piece of C code:Instructions have the following associated latencies (in
Assume that we are going to compute C on both a single core shared memory machine and a 4-core shared-memory machine. Compute the speedup we would expect to obtain on the 4-core machine, ignoring any
Your job is to cook 3 cakes as efficiently as possible. Assuming that you only have one oven large enough to hold one cake, one large bowl, one cake pan, and one mixer, come up with a schedule to
Repeat 7.6.1, assuming that updates to C incur a cache miss due to false sharing when consecutive elements are in a row (i.e., index i) are updated.Exercise 7.6.1Assume that we are going to compute C
Describe how we can solve this problem by introducing the concept of a priority? But can we guarantee that we will treat all the philosophers fairly? Explain.The dining philosopher's problem is a
Compare the resiliency to failure of n-cube to a fully- connected interconnection network. Plot a comparison of reliability as a function of the added number of links for the two topologies.Refer to
Discuss what changes may be necessary in future multi-core CPU platforms in order to better match the resource demands placed on these systems. For instance, can multi-threading play an effective
How could you make the execution more deterministic so that only one set of values is possible?Consider the following portions of two different programs running at the same time on four processors in
For an 8-wide SIMD machine (i.e., 8 parallel SIMD functional units), write an assembly program in using your own SIMD extensions to MIPS to execute the loop. Compare the number of instructions
If each directory entry maintains a byte of information for each CPU, if our CC-NUMA system has S memory blocks, and the system has T nodes, provide an equation that expresses the amount of memory
In terms of the Rooline Model, how dependent will the results you obtain when running these benchmarks be on the amount of sharing and synchronization present in the workload used?Benchmarking is
Discuss the similarities and differences between an MISD and SIMD machine. Answer this question in terms of data-level parallelism.A systolic array is an example of an MISD machine. A systolic array
For a directory-based coherency mechanism, describe the internode traffic generated when executing the same code pattern.Considering the CC-NUMA system described in the Exercise 7.8, assume that the
Next, consider which of the activities could be carried out concurrently (e.g., eating breakfast and listening to the news). For each of your activities, describe which other activity could be paired
When an instruction in a later iteration of a loop depends upon a data value produced in an earlier iteration of the same loop, we say that there is a loop carried dependence between iterations of
If you have GPU hardware available, complete a performance analysis your program, examining the computation time for the GPU and a CPU version of your program for a range of vector sizes. Explain any
Next, assume that Y is equal to N. How would this affect your conclusions in your previous answer? If you were tasked with obtaining the best speedup factor possible (i.e., strong scaling), explain
Next, assume that Y is equal to length(m). How would this affect your conclusions your previous answer? If you were tasked with obtaining the best speedup factor possible (i.e., strong scaling),
If on average we need to access memory once every 50 cycles, what is impact on our application?On a CC-NUMA system, the cost of accessing non-local memory can limit our ability to utilize
Assume now that you have three bowls, 3 cake pans and 3 mixers. How much faster is the process now that you have additional resources?You are trying to bake 3 blueberry pound cakes. Cake ingredients
Does the CPU relinquish control of memory when DMA is active? For example, can a peripheral simply communicate with memory directly, avoiding the CPU completely?Direct Memory Access (DMA) allows
Repeat 6.19.2 for a large disk farm operated by an online backup company. Does upgrading to either RAID 0 or RAID 1 make economic sense given that your income model is based on the availability of
Competing vendors for the systems you identified in 6.20.2 have offered to allow you to evaluate their systems on site. Identify the benchmarks you will use to determine which system is best for your
We can implement requests to the waiter as either a queue of requests or as a periodic retry of a request. With a queue, requests are handled in the order they are received. The problem with using
How would you ix the false sharing issue that can occur?Matrix multiplication plays an important role in a number of applications. Two matrices can only be multiplied if the number of columns of the
The latency of the interconnect network plays a large role in the efficiency of message passing systems. How fast does the interconnect need to be in order to obtain any speedup from using the
Compare the cake-making task to computing 3 iterations of a loop on a parallel computer. Identify data-level parallelism and task-level parallelism in the cake-making loop.You are trying to bake 3
Estimate how much shorter time it would take to carry out these activities if you tried to carry out as many tasks in parallel as possible.First, write down a list of your daily activities that you
Assume now that you have two friends that will help you cook, and that you have a large oven that can accommodate all three cakes. How will this change the schedule you arrived at in 7.5.1 above?You
If on average we need to access memory once every 100 cycles, what is impact on our application?On a CC-NUMA system, the cost of accessing non-local memory can limit our ability to utilize
Loop unrolling was described in Chapter 4. Apply loop unrolling to this loop and then consider running this code on a 2-node distributed memory message passing system. Assume that we are going to use
First, write down a list of your daily activities that you typically do on a weekday. For instance, you might get out of bed, take a shower, get dressed, eat breakfast, dry your hair, brush your
Find all hazards in this instruction sequence for a 5-stage pipeline with and then without forwarding.Problems in this exercise refer to the following instruction sequences: a. b. ADD R1, R2,
Control hazards can be eliminated by adding branch delay slots. How many delay slots must follow each branch if we want to eliminate all control hazards in this processor?This exercise is intended to
What is the total latency of an LW instruction in a pipelined and non-pipelined processor?In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this
This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor. Problems in this exercise refer to pipelined datapaths from Figure
This exercise explores some of the tradeoffs involved in pipelining, such as clock cycle time and utilization of hardware resources. The first three problems in this exercise refer to the following
If the loop exits after executing only two iterations, draw a pipeline diagram for your MIPS code from 4.28.1 executed on a 2-issue processor shown in Figure 4.69. Assume the processor has perfect
In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following
Repeat 4.27.1, but this time assume that the instruction in the delay slot also causes a hardware error exception when it is in MEM stage.Exercise 4.27.1Assume that this branch is correctly predicted
The first three problems in this exercise refer to the execution of the following instruction in the pipelined datapath from Figure 4.51, and assume the following clock cycle time, ALU latency, and
Consider a datapath similar to the one in Figure 4.11, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this

Showing 300 - 400 of 1060