Refer to the attachment and:
CASE STUDY 5 Fail Away with Dynamo, Bigtable, and Cassandra As you learned in Case Study 1, Amazon.com processed more customer changes it, that Wish List can only be rewritten to than 306 order items per second on its peak day of the 2012 servers B and C. It cannot be written to A, because A is not holiday sales season. To do that, it processed customer trans- running. When server A comes back into service, it will have actions on tens of thousands of servers. With that many com- the old copy of the Wish List. The next day, when the customer puters, failure is inevitable. Even if the probability of any one reopens his or her Wish List, two different versions exist: server failing is .0001, the likelihood that not one out of 10,000 the most recent one on servers B and C and an older one on of them fails is .9999 raised to the 10,000 power, which is about server A. The customer wants the most current one. How can .37. Thus, for these assumptions the likelihood of at least one Amazon.com ensure that it will be delivered? Keep in mind failure is 63 percent. For reasons that go beyond the scope that 15.6 million orders are being shipped while this goes on. of this discussion, the likelihood of failure is actually much None of the current relational DBMS products was designed greater. for problems like this. Consequently, Amazon.com engineers Amazon.com must be able to thrive, even in the presence developed Dynamo, a specialized data store for reliably pro- of such constant failure. Or, as Amazon.com engineers stated: cessing massive amounts of data on tens of thousands of serv- 'Customers should be able to view and add items to their ers. Dynamo provides an always-open experience for Amazon. shopping cart even if disks are failing, network routes are flap- com's retail customers; Amazon.com also sells Dynamo store ping, or data centers are being destroyed by tornados." services to others via its $3 Web Services product offering. The only way to deal with such failure is to replicate the data Meanwhile, Google was encountering similar problems on multiple servers. When a customer stores a Wish List, for that could not be met by commercially available relational example, that Wish List needs to be stored on different, geo- DBMS products. In response, Google created Bigtable, a data graphically separated servers. Then, when (notice when, not store for processing petabytes of data on hundreds of thou- if a server with one copy of the Wish List fails, Amazon.com sands of servers. Bigtable supports a richer data model than applications obtain it from another server. Dynamo, which means that it can store a greater variety of data Such data replication solves one problem but introduces structures. another. Suppose that the customer's Wish List is stored on Both Dynamo and Bigtable are designed to be elastic; this servers A, B, and C and server A fails. While server A is down, term means that the number of servers can dynamically in- server B or C can provide a copy of the Wish List, but if the crease and decrease without disrupting performance. "DeCandia, et al., "Dynamo: Amazon's Highly Available Key-Value Store," Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, 'Bigtable: A Distributed Storage System for Structured Data," OSDI 2006: Seventh Symposium on Operating System Design and Implementation, Seattle, um tay ....L-. none bett./ /lake annalo com aners/hietable.html.Case Study 5 195 In 2007, Facebook encountered similar data storage prob- Cassandra's performance is vastly superior to relational lems: Massive amounts of data, the need to be elastically scal- DBMS products. In one comparison, Cassandra was found to able, tens of thousands of servers, and high volumes of traffic. be 2,500 times faster than MySQL for write operations and 23 In response to this need, Facebook began development on times faster for read operations on massive amounts of data Cassandra, a data store that provides storage capabilities like on hundreds of thousands of possibly failing computers! Dynamo with a richer data model like Bigtable. 1,12 Initially, Facebook used Cassandra to power its Inbox Search. By 2008, QUESTIONS Facebook realized that it had a bigger project on its hands than it wanted and gave the source code to the open source com- 5-5. Clearly, Dynamo, Bigtable, and Cassandra are critical tech- munity. As of 2012, Cassandra is used by Facebook, Twitter, nology to the companies that created them. Why did they Digg, Reddit, Cisco, and many others. llow their employees to publish academic papers about Cassandra, by the way, is a fascinating name for a data store. them? Why did they not keep them as proprietary secrets? In Greek mythology, Cassandra was so beautiful that Apollo fell in love with her and gave her the power to see the future. 5-6. What do you think this movement means to the existing Alas, Apollo's love was unrequited and he cursed her so that no DBMS vendors? How serious is the NoSQL threat? Justify one would ever believe her predictions. The name was appar- your answer. What responses by existing DBMS vendors would be sensible? ently a slam at Oracle. Cassandra is elastic and fault-tolerant; it supports mas- 5-7. Is it a waste of your time to learn about the relational sive amounts of data on thousands of servers and provides model and Microsoft Access? Why or why not? durability, meaning that once data is committed to the data store, it won't be lost, even in the presence of failure. One of 5-8. Given what you know about AllRoad Parts, should it use a relational DBMS, such as Oracle Database or MySQL, or the most interesting characteristics of Cassandra is that clients should it use Cassandra? (meaning the programs that run Facebook, Twitter, etc.) can select the level of consistency that they need. If a client re- 5-9. Suppose that AllRoad decides to use a NoSQL solution, quests that all servers always be current, Cassandra will ensure but a battle emerges among the employees in the IT that that happens, but performance will be slow. At the other department. One faction wants to use Cassandra, but end of the trade-off spectrum, clients can require no consis- another faction wants to use a different NoSQL data store, tency, whereby performance is maximized. In between, clients named MongoDB (www.mongodb.org). Assume that can require that a majority of the servers that store a data item you're Kelly, and Lucas asks for your opinion about how be consistent. he should proceed. How do you respond