The performance of a snooping cache-coherent multiprocessor depends on many detailed implementation issues that determine how quickly
Question:
For the multiprocessor illustrated in Figure 4.37, consider the execution of a sequence of operations on a single CPU where
¢ CPU read and write hits generate no stall cycles.
¢ CPU read and write misses generate Nmemory and Ncache stall cycles if satisfied by memory and cache, respectively.
¢ CPU write hits that generate an invalidate incur Ninvalidate stall cycles.
¢ A writeback of a block, either due to a conflict or another processor's request to an exclusive block, incurs an additional Nwriteback stall cycles.
Consider two implementations with different performance characteristics summarized in Figure 4.38.
Consider the following sequence of operations assuming the initial cache state in Figure 4.37. For simplicity, assume that the second operation begins after the first completes (even though they are on different processors):
P1: read 110
P15: read 110
For Implementation 1, the first read generates 80 stall cycles because the read is satisfied by P0's cache. P1 stalls for 70 cycles while it waits for the block, and P0 stalls for 10 cycles while it writes the block back to memory in response to P1's request. Thus the second read by P15 generates 100 stall cycles because its miss is satisfied by memory. Thus this sequence generates a total of 180 stall cycles.
For the following sequences of operations, how many stall cycles are generated by each implementation?
a. P0: read 120
P0: read 128
P0: read 130
b. P0: read 100
P0: write 108 P0: write 130 c. P1: read 120
P1: read 128
P1: read 130
d. P1: read 100
P1: write 108 P1: write 130
Figure 4.38 Snooping coherence latencies.
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Computer Architecture A Quantitative Approach
ISBN: 978-0123704900
4th edition
Authors: John L. Hennessy, David A. Patterson
Question Posted: