Question: Assume that there are three node configurations with different disk capabilities: an HDD-based node, an SSD-based node, and a node with no local disk. Furthermore,

 Assume that there are three node configurations with different disk capabilities:

Assume that there are three node configurations with different disk capabilities: an HDD-based node, an SSD-based node, and a node with no local disk. Furthermore, assume that the cloud vendor will place any data that you request in the local disk (if any) before your task starts, and will do so for free. Nodes with no local disk cost $0.30 per hour. The network has throughput of 0.1 GB/s per node. The HDD-based node has throughput of 0.2 GB/s, a storage capacity of 2 TB and costs $0.50 per hour. The SSD-based node has throughput of 1 GB/s, a storage capacity of 0.5 TB and costs $1.50 per hour. Assume the dataset volume is 1 PB. The business needs of your organization dictate that the map phase of a MapReduce needs to achieve I/O throughput of at least 500 GB/sec. (The aggregate I/O throughput is the sum of the throughput across all nodes.) [24 points] For each configuration, how many nodes are needed at minimum to meet this goal and what is the total cost? No local disk configuration: HDD configuration: SSD configuration: [6 points] Pick the configuration with the least cost. At what price per hour (if any) would the other two configurations become cost competitive? Assume that there are three node configurations with different disk capabilities: an HDD-based node, an SSD-based node, and a node with no local disk. Furthermore, assume that the cloud vendor will place any data that you request in the local disk (if any) before your task starts, and will do so for free. Nodes with no local disk cost $0.30 per hour. The network has throughput of 0.1 GB/s per node. The HDD-based node has throughput of 0.2 GB/s, a storage capacity of 2 TB and costs $0.50 per hour. The SSD-based node has throughput of 1 GB/s, a storage capacity of 0.5 TB and costs $1.50 per hour. Assume the dataset volume is 1 PB. The business needs of your organization dictate that the map phase of a MapReduce needs to achieve I/O throughput of at least 500 GB/sec. (The aggregate I/O throughput is the sum of the throughput across all nodes.) [24 points] For each configuration, how many nodes are needed at minimum to meet this goal and what is the total cost? No local disk configuration: HDD configuration: SSD configuration: [6 points] Pick the configuration with the least cost. At what price per hour (if any) would the other two configurations become cost competitive

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!