Question: 2. Consider that you are given a file containing the annotated form of the Ramayana which runs into 900MB as a text file. The

2. Consider that you are given a file containing the annotated form of the Ramayana which runs into 900MB as a text file. The Ramayana is broken into 7 major Kaandas (books) namely the Baala, Ayodhya, Aranya, Kishkindha, Sundara, Yuddha, Uttara kaandas. Unfortunately the sentences of this text file are not in order and are jumbled up. However, each sentence of the text is stored in the format , (assume there are no commas in the sentence). For example, a short section is of the form Yuddha, Hanuman set off to bring back the Sanjeevani plant You have been asked to process this file using Map Reduce using Hadoop v2. Answer the following questions providing justifications. Credit will be awarded only if the justification is right. a) Write MR pseudo code to identify the number of sentences in each kaanda? Identify the intermediate keys and final keys. b) How many mappers and reducers would be used for processing this? Will a combiner help to improve the performance? c) One of the mappers is progressing slowly. How does the Hadoop YARN framework respond to this?
Step by Step Solution
3.35 Rating (158 Votes )
There are 3 Steps involved in it
The students question seems to involve processing the text file of the Ramayana which is divided into seven books or Kaandas using the MapReduce parad... View full answer
Get step-by-step solutions from verified subject matter experts
