Imagine you have created a web service that runs very well (responds within 100 ms latency) 99%

Question:

Imagine you have created a web service that runs very well (responds within 100 ms latency) 99% of the time, and has performance issues 1% of the time (maybe the CPU went into a lower power state and the response took 1000 ms, etc.).

a. Your service grows popular, and you now have 100 servers and your computation has to touch all these servers to handle the user request. What is the percentage of time your query is likely to have a slow response time, across 100 servers?

b. Instead of “two nines” (99%) single server latency SLA, how many “nines”
do we need to have for the single server latency SLA so that the cluster latency SLA has bad latencies only 10% of the time or lower?

c. How do the answers to parts (a) and (b) change if we have 2000 servers?

d. Section 6.4 (page 452) discusses “tail-tolerant” designs. What kind of design optimizations would you need to make in your web service

Fantastic news! We've Found the answer you've been seeking!