Question: The performance of a snooping cache-coherent multiprocessor depends on many detailed implementation issues that determine how quickly a cache responds with data in an exclusive

The performance of a snooping cache-coherent multiprocessor depends on many detailed implementation issues that determine how quickly a cache responds with data in an exclusive or M state block. In some implementations, a CPU read miss to a cache block that is exclusive in another processor's cache is faster than a miss to a block in memory. This is because caches are smaller, and thus faster, than main memory. Conversely, in some implementations, misses satisfied by memory are faster than those satisfied by caches. This is because caches are generally optimized for "front side" or CPU references, rather than "back side" or snooping accesses.
For the multiprocessor illustrated in Figure 4.37, consider the execution of a sequence of operations on a single CPU where
€¢ CPU read and write hits generate no stall cycles.
€¢ CPU read and write misses generate Nmemory and Ncache stall cycles if satisfied by memory and cache, respectively.
€¢ CPU write hits that generate an invalidate incur Ninvalidate stall cycles.
€¢ A writeback of a block, either due to a conflict or another processor's request to an exclusive block, incurs an additional Nwriteback stall cycles.
Consider two implementations with different performance characteristics summarized in Figure 4.38.
Consider the following sequence of operations assuming the initial cache state in Figure 4.37. For simplicity, assume that the second operation begins after the first completes (even though they are on different processors):
P1: read 110
P15: read 110
For Implementation 1, the first read generates 80 stall cycles because the read is satisfied by P0's cache. P1 stalls for 70 cycles while it waits for the block, and P0 stalls for 10 cycles while it writes the block back to memory in response to P1's request. Thus the second read by P15 generates 100 stall cycles because its miss is satisfied by memory. Thus this sequence generates a total of 180 stall cycles.
For the following sequences of operations, how many stall cycles are generated by each implementation?
a. P0: read 120
P0: read 128
P0: read 130
b. P0: read 100
P0: write 108 P0: write 130 c. P1: read 120
P1: read 128
P1: read 130
d. P1: read 100
P1: write 108 P1: write 130

The performance of a snooping cache-coherent multiprocessor depends on many

Figure 4.38 Snooping coherence latencies.

Parameter Implementation 1 100 70 15 10 Implementation 2 cache invalidate writcback 100 130 15 10

Step by Step Solution

3.49 Rating (175 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a P0 read 120 Read miss satisfied by memory P0 read 128 Read mis... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Document Format (1 attachment)

Word file Icon

903-C-S-S-A-D (3196).docx

120 KBs Word File

Students Have Also Explored These Related Systems Analysis And Design Questions!