Question: 1) Consider a Hadoop job that processes an input data file of size equal to 88 disk blocks (88 different blocks, you can assume that
1) Consider a Hadoop job that processes an input data file of size equal to 88 disk blocks (88 different blocks, you can assume that HDFS replication factor is set to 1). The mapper in this job requires 2 minutes to read and fully process a single block of data. Reducer requires 1 second (not minute) to produce an answer for one key worth of values and there are a total of 6000 distinct keys (mappers generate a lot of key-value pairs, but keys only occur in the 1-6000 range for a total of 6000 unique entries). Assume that each node has a reducer and that the keys are distributed evenly.
a) How long will it take to complete the job if you only had one Hadoop worker node? For the sake of simplicity, assume that that only one mapper and only one reducer are created on every node.
b) 30 Hadoop worker nodes?
c) 50 Hadoop worker nodes?
d) 100 Hadoop worker nodes?
e) Would changing the replication factor have any affect your answers for a-d? You can ignore the network transfer costs as well as the possibility of node failure.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
