Question: Assume that there are three node configurations with different disk capabilities: an HDD-based node, an SSD-based node, and a node with no local disk. Furthermore,

Assume that there are three node configurations with different disk capabilities: an HDD-based node, an SSD-based node, and a node with no local disk. Furthermore, assume that the cloud vendor will place any data that you request in the local disk (if any) before your task starts, and will do so for free. Nodes with no local disk cost $0.30 per hour. The network has throughput of 0.1 GB/s per node. The HDD-based node has throughput of 0.2 GB/s, a storage capacity of 2 TB and costs $0.50 per hour. The SSD-based node has throughput of 1 GB/s, a storage capacity of 0.5 TB and costs $1.50 per hour. Assume the dataset volume is 1 PB. The business needs of your organization dictate that the map phase of a MapReduce needs to achieve I/O throughput of at least 500 GB/sec. (The aggregate I/O throughput is the sum of the throughput across all nodes.) [24 points] For each configuration, how many nodes are needed at minimum to meet this goal and what is the total cost? No local disk configuration: HDD configuration: SSD configuration: [6 points] Pick the configuration with the least cost. At what price per hour (if any) would the other two configurations become cost competitive? Assume that there are three node configurations with different disk capabilities: an HDD-based node, an SSD-based node, and a node with no local disk. Furthermore, assume that the cloud vendor will place any data that you request in the local disk (if any) before your task starts, and will do so for free. Nodes with no local disk cost $0.30 per hour. The network has throughput of 0.1 GB/s per node. The HDD-based node has throughput of 0.2 GB/s, a storage capacity of 2 TB and costs $0.50 per hour. The SSD-based node has throughput of 1 GB/s, a storage capacity of 0.5 TB and costs $1.50 per hour. Assume the dataset volume is 1 PB. The business needs of your organization dictate that the map phase of a MapReduce needs to achieve I/O throughput of at least 500 GB/sec. (The aggregate I/O throughput is the sum of the throughput across all nodes.) [24 points] For each configuration, how many nodes are needed at minimum to meet this goal and what is the total cost? No local disk configuration: HDD configuration: SSD configuration: [6 points] Pick the configuration with the least cost. At what price per hour (if any) would the other two configurations become cost competitive
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
