- Implement the logic equations of Exercise B.43 as a PLA.Exercise B.43We wish to add a yellow light to our traffic light example on page B-68. We will do this by changing the clock to run at 0.25 Hz
- Write down the next-state and output-function tables for the traffic light controller described in Exercise B.41.Exercise B.41We wish to add a yellow light to our traffic light example on page B-68.
- A Gray code is a sequence of binary numbers with the property that no more than 1 bit changes in going from one element of the sequence to another. For example, here is a 3-bit binary Gray code: 000,
- Construct a 3-bit counter using three D flip-flops and a selection of gates. The inputs should consist of a signal that resets the counter to 0, called reset, and a signal to increment the counter,
- Assign state numbers to the states of the finite-state machine you constructed for Exercise B.37 and write a set of logic equations for each of the outputs, including the next-state bits.Exercise
- A friend would like you to build an “electronic eye” for use as a fake security device. Th e device consists of three lights lined up in a row, controlled by the outputs Left , Middle, and Right,
- Figure B.8.8 on page B-55 illustrates the implementation of the register file for the MIPS datapath. Pretend that a new register file is to be built, but that there are only two registers and only
- Quite often, you would expect that given a timing diagram containing a description of changes that take place on a data input D and a clock input C (as in Figures B.8.3 and B.8.6 on pages B-52 and
- There are times when we want to add a collection of numbers together. Suppose you wanted to add four 4-bit numbers (A, B, E, F) using 1-bit full adders. Let?s ignore carry lookahead for now. You
- Instead of thinking of an adder as a device that adds two numbers and then links the carries together, we can think of the adder as a hardware device that can add three inputs together (ai, bi, ci)
- Show a truth table for a multiplexor (inputs A, B, and S; output C ), using don’t cares to simplify the table where possible.
- Give an algorithm for constructing the sum-of-products representation for an arbitrary logic equation consisting of AND, OR, and NOT. The algorithm should be recursive and should not construct the
- Prove that a two-input multiplexor is also universal by showing how to build the NAND (or NOR) gate using a multiplexor.
- Prove that the NAND gate is universal by showing how to build the AND, OR, and NOT functions using a two-input NAND gate.
- Prove that the NOR gate is universal by showing how to build the AND, OR, and NOT functions using a two-input NOR gate.
- One logic function that is used for a variety of purposes (including within adders and to compute parity) is exclusive OR. The output of a two-input exclusive OR function is true only if exactly one
- Show that there are 2n entries in a truth table for a function with n inputs.
- Write and test a MIPS assembly language program to compute and print the first 100 prime numbers. A number n is prime if no numbers except 1 and n divide it evenly. You should implement two
- Using SPIM, write and test a program that reads in a positive integer using the SPIM system calls. If the integer is not positive, the program should terminate with the message “Invalid Entry”;
- The simple exception handler always jumps back to the instruction following the exception. This works fine unless the instruction that causes the exception is in the delay slot of a branch. In that
- Is it ever safe for a user program to use registers $k0 or $k1?
- AMD has recently announced that they will be integrating a graphics processing unit with their x86 cores in a single package, though with different clocks for each of the cores. This is an example of
- Download the CUDA Toolkit and SDK from http://www.nvidia.com/object/ cuda_get.html. Make sure to use the ?emurelease? (Emulation Mode) version of the code (you will not need actual NVIDIA hardware
- A systolic array is an example of an MISD machine. A systolic array is a pipeline network or “wavefront” of data processing elements. Each of these elements does not need a program counter since
- Virtualization software is being aggressively deployed to reduce the costs of managing today’s high performance servers. Companies like VMWare, Microsoft and IBM have all developed a range of
- What is 5ED4 - 07A4 when these values represent unsigned 16-bit hexadecimal numbers? The result should be written in hexadecimal. Show your work.
- Assume that for a given program 70% of the executed instructions are arithmetic, 10% are load/store, and 20% are branch.1. Given this instruction mix and the assumption that an arithmetic instruction
- Write a program in MIPS assembly language to convert an ASCII number string containing positive and negative integer decimal strings, to an integer. Your program should expect register $a0 to hold
- Translate function f into MIPS assembly language. If you need to use registers $t0 through $t7, use the lower numbered registers first. Assume the function declaration for func is “int f(int a, int
- Translate the following C code to MIPS. Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A
- Show how the value 0xabcdef12 would be arranged in memory of a little-endian and a big-endian machine. Assume the data is stored starting at address 0.
- Translate 0xabcdef12 into decimal.
- For each MIPS instruction, show the value of the opcode (OP), source register (RS), and target register (RT) fields. For the I-type instructions, show the value of the immediate field, and for the
- Translate the following C code to MIPS assembly code. Use a minimum number of instructions. Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also,
- Rewrite the loop from Exercise 2.29 to reduce the number of MIPS instructions executed.Exercise 2.29Translate the following loop into C. Assume that the C-level integer i is held in register $t1, $s2
- For each function call, show the contents of the stack after the function call is made. Assume the stack pointer is originally at address 0x7ff ff ff c, and follow the register conventions as
- Write the MIPS assembly code to implement the following C code:lock(lk);shvar=max(shvar,x);unlock(lk);Assume that the address of the lk variable is in $a0, the address of the shvar variable is in
- Repeat Exercise 2.43, but this time use ll/sc to perform an atomic update of the shvar variable directly, without using lock() and unlock(). Note that in this problem there is no variable lk.Exercise
- What is 5ED4 - 07A4 when these values represent signed 16-bit hexadecimal numbers stored in sign-magnitude format? The result should be written in hexadecimal. Show your work.
- What is 4365 - 3412 when these values represent unsigned 12-bit octal numbers? The result should be written in octal. Show your work.
- Assume 151 and 214 are signed 8-bit decimal integers stored in two’s complement format. Calculate 151 - 214 using saturating arithmetic. The result should be written in decimal. Show your work.
- Assume 151 and 214 are unsigned 8-bit integers. Calculate 151 + 214 using saturating arithmetic. The result should be written in decimal. Show your work.
- In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following
- Consider the following three CPU organizations:CPU SS: A 2-core superscalar microprocessor that provides out-of-order issue capabilities on 2 function units (FUs). Only a single thread can run on
- In addition to the basic laws we discussed in this section, there are two important theorems, called DeMorgan€™s theorems:Prove DeMorgan€™s theorems with a truth table of the form
- Implement the four-input odd-parity function with a PLA.
- Implement the four functions described in Exercise B.11 using a PLA.Exercise B.11Assume that X consists of 3 bits, x2 x1 x0. Write four logic functions that are true if and only if■ X contains only
- Assume that X consists of 3 bits, x2 x1 x0, and Y consists of 3 bits, y2 y1 y0. Write logic functions that are true if and only if■ X < Y, where X and Y are thought of as unsigned binary
- Implement a switching network that has two data inputs (A and B), two data outputs (C and D), and a control input (S). If S equals 1, the network is in pass-through mode, and C should equal A, and D
- When a program is adapted to run on multiple processors in a multiprocessor system, the execution time on each processor is comprised of computing time and the overhead time required for locked
- Prove that the two equations for E in the example starting on page B-7 are equivalent by using DeMorgan’s theorems and the axioms shown on page B-7.
- Construct the truth table for a four-input odd-parity function (see page B-65 for a description of parity).
- The Verilog code on page B-53 is for a D flip-flop. Show the Verilog code for a D latch.
- Write the equations for the carry-lookahead logic for a 64-bit adder using the new notation from Exercise B.26 and using 16-bit adders as building blocks. Include a drawing similar to Figure B.6.3 in
- First, show the block organization of the 16-bit carry save adders to add these 16 terms, as shown in Figure B.14.1. Assume that the time delay through each 1-bit adder is 2T. Calculate the time of
- Rewrite the equations on page B-44 for a carry-lookahead logic for a 16-bit adder using a new notation. First, use the names for the CarryIn signals of the individual bits of the adder. That is, use
- A simple check for overfl ow during addition is to see if the CarryIn to the most significant bit is not the same as the CarryOut of the most significant bit. Prove that this check is the same as
- The ALU supported set on less than (slt) using just the sign bit of the adder. Let’s try a set on less than operation using the values -7ten and 6ten.To make it simpler to follow the example,
- Section 3.3 presents basic operation and possible implementations of multipliers. A basic unit of such implementations is a shift - and-add unit. Show a Verilog implementation for this unit. Show how
- Given the following logic diagram for an accumulator, write down the Verilog module implementation of it. Assume a positive edgetriggered register and asynchronous Rst. In Adder 16 16 Out Load Clk
- Write down a Verilog module implementation of a 2-to-4 decoder (and/or encoder).
- What is the function implemented by the following Verilog modules: module FUNC1 (10, I1, S, out); input I0, I1; input S; output out; out = S? Il: I0; endmodule module FUNC2 (out,ctl, clk,reset);
- Derive the product-of-sums representation for E shown on page B-11 starting with the sum-of-products representation. You will need to use DeMorgan’s theorems.
- Assume that X consists of 3 bits, x2 x1 x0. Write four logic functions that are true if and only if■ X contains only one 0■ X contains an even number of 0s■ X when interpreted as an unsigned
- Implement the four-input odd-parity function with AND and OR gates using bubbled inputs and outputs.
- Assume a quad-core computer system can process database queries at a steady state rate of requests per second. Also assume that each transaction takes, on average, a fixed amount of time to process.
- In future systems, we expect to see heterogeneous computing platforms constructed out of heterogeneous CPUs. We have begun to see some appear in the embedded processing market in systems that contain
- When performing computations on sparse matrices, latency in the memory hierarchy becomes much more of a factor. Sparse matrices lack the spatial locality in the data stream typically found in
- Benchmarking is field of study that involves identifying representative workloads to run on specific computing platforms in order to be able to objectively compare performance of one system to
- Refer to Figure 6.14b, which shows an n-cube interconnect topology of order 3 that interconnects 8 nodes. One attractive feature of an n-cube interconnection network topology is its ability to
- We would like to execute the loop below as efficiently as possible. We have two different machines, a MIMD machine and a SIMD machine.for (i=0; i < 2000; i++)for (j=0; j<3000; j++)X_array[i][j]
- The dining philosopher’s problem is a classic problem of synchronization and concurrency. The general problem is stated as philosophers sitting at a round table doing one of two things: eating or
- Consider the following portions of two different programs running at the same time on four processors in a symmetric multicore processor (SMP). Assume that before this code is run, both x and y are
- Matrix multiplication plays an important role in a number of applications. Two matrices can only be multiplied if the number of columns of the first matrix is equal to the number of rows in the
- Consider the following recursive mergesort algorithm (another classic divide and conquer algorithm). Mergesort was first described by John Von Neumann in 1945. The basic idea is to divide an unsorted
- Consider the following piece of C code:for (j=2;j<1000;j++)D[j] = D[jˆ’1]+D[jˆ’2];Th e MIPS code corresponding to the above fragment is:Instructions have the following
- Many computer applications involve searching through a set of data and sorting the data. A number of efficient searching and sorting algorithms have been devised in order to reduce the runtime of
- You are trying to bake 3 blueberry pound cakes. Cake ingredients are as follows:1 cup butter, softened1 cup sugar4 large eggs1 teaspoon vanilla extract1/2 teaspoon salt1/4 teaspoon nutmeg1 1/2 cups
- First, write down a list of your daily activities that you typically do on a weekday. For instance, you might get out of bed, take a shower, get dressed, eat breakfast, dry your hair, brush your
- In this exercise we show the definition of a web server log and examine code optimizations to improve log processing speed. Th e data structure for the log is defined as follows:
- Chip multiprocessors (CMPs) have multiple cores and their caches on a single chip. CMP on-chip L2 cache design has interesting trade-off s. Th e following table shows the miss rates and hit latencies
- Cache coherence concerns the views of multiple processors on a given cache block. The following data shows two processors and their read/write operations on two different words of a cache block X
- In this exercise, we will explore the control unit for a cache controller for a processor with a write buffer. Use the finite state machine found in Figure 5.40 as a starting point for designing your
- One of the biggest impediments to widespread use of virtual machines is the performance overhead incurred by running a virtual machine. Listed below are various performance parameters and application
- To support multiple virtual machines, two levels of memory virtualization are needed. Each virtual machine still controls the mapping of virtual address (VA) to physical address (PA), while the
- In this exercise, we will examine how replacement policies impact miss rate. Assume a 2-way set associative cache with 4 blocks. To solve the problems in this exercise, you may find it helpful to
- In this exercise, we will examine space/time optimizations for page tables. The following list provides parameters of a virtual memory system.1. For a single-level page table, how many page table
- As described in Section 5.7, virtual memory uses a page table to track the mapping of virtual addresses to physical addresses. This exercise shows how this table must be updated as addresses are
- For a high-performance system such as a B-tree index for a database, the page size is determined mainly by the data size and disk performance. Assume that on average a B-tree index page is 70% full
- Th is Exercise examines the single error correcting, double error detecting (SEC/DED) Hamming code.1. What is the minimum number of parity bits required to protect a 128-bit word using the SEC/DED
- Mean Time Between Failures (MTBF), Mean Time To Replacement (MTTR), and Mean Time To Failure (MTTF) are useful metrics for evaluating the reliability and availability of a storage resource. Explore
- This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the address stream
- In this exercise, we will look at the different ways capacity affects overall performance. In general, cache access time is proportional to capacity. Assume that main memory accesses take 70 ns and
- Based on your answers to 3.35 and 3.36, does (3.41796875 10-3 × 6.34765625 × 10-3) × 1.05625 × 102 = 3.41796875 × 10-3 × (6.34765625 × 10-3 × 1.05625 × 102)?
- If the bit pattern 0×0C000000 is placed into the Instruction Register, what MIPS instruction will be executed?
- What decimal number does the bit pattern 0×0C000000 represent if it is a two’s complement integer? An unsigned integer?
- Media applications that play audio or video files are part of a class of workloads called €œstreaming€ workloads; i.e., they bring in large amounts of data but do not reuse much
- Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:
- For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.1. What is the cache block size (in words)?2. How many entries does the cache