2. Consider that you are given a file containing the annotated form of the Ramayana which...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
2. Consider that you are given a file containing the annotated form of the Ramayana which runs into 900MB as a text file. The Ramayana is broken into 7 major Kaandas (books) namely the Baala, Ayodhya, Aranya, Kishkindha, Sundara, Yuddha, Uttara kaandas. Unfortunately the sentences of this text file are not in order and are jumbled up. However, each sentence of the text is stored in the format <kaandaname>, <sentence> (assume there are no commas in the sentence). For example, a short section is of the form Yuddha, Hanuman set off to bring back the Sanjeevani plant You have been asked to process this file using Map Reduce using Hadoop v2. Answer the following questions providing justifications. Credit will be awarded only if the justification is right. a) Write MR pseudo code to identify the number of sentences in each kaanda? Identify the intermediate keys and final keys. b) How many mappers and reducers would be used for processing this? Will a combiner help to improve the performance? c) One of the mappers is progressing slowly. How does the Hadoop YARN framework respond to this? 2. Consider that you are given a file containing the annotated form of the Ramayana which runs into 900MB as a text file. The Ramayana is broken into 7 major Kaandas (books) namely the Baala, Ayodhya, Aranya, Kishkindha, Sundara, Yuddha, Uttara kaandas. Unfortunately the sentences of this text file are not in order and are jumbled up. However, each sentence of the text is stored in the format <kaandaname>, <sentence> (assume there are no commas in the sentence). For example, a short section is of the form Yuddha, Hanuman set off to bring back the Sanjeevani plant You have been asked to process this file using Map Reduce using Hadoop v2. Answer the following questions providing justifications. Credit will be awarded only if the justification is right. a) Write MR pseudo code to identify the number of sentences in each kaanda? Identify the intermediate keys and final keys. b) How many mappers and reducers would be used for processing this? Will a combiner help to improve the performance? c) One of the mappers is progressing slowly. How does the Hadoop YARN framework respond to this?
Expert Answer:
Answer rating: 100% (QA)
The students question seems to involve processing the text file of the Ramayana which is divided into seven books or Kaandas using the MapReduce parad... View the full answer
Related Book For
Systems Analysis And Design
ISBN: 978-1119496489
7th Edition
Authors: Alan Dennis, Barbara Wixom, Roberta M. Roth
Posted Date:
Students also viewed these programming questions
-
A ticket to the school dance is $6 and usually 250 students attend. The dance committee knows that for every $1 increase in the price of a ticket, 25 fewer students attend the dance. What ticket...
-
Reviewing the balance sheet of PEDRO Sporting Goods Corporation, Tom discovered that the total liabilities amounted to $6 million, while the owner's equity was $2 million. What is the total assets of...
-
Consider that you are given a file containing the annotated form of the Mahabharata which runs into 4GB as a text file. The Mahabharata is broken into 18 chapters of parvas and each parva had many...
-
A total weighted score of in an External Factor Evaluation (EFE) Matrix indicates that an organization is responding in an outstanding way to existing opportunities and threats in its industry. Oa....
-
Draw the shear and moment diagrams for the shaft. The support at A is a journal bearing and at B it is a thrust bearing. Given: F1 = 400 lb F2 = 800 lb w = 100lb/in a = 4 in b = 12 in c = 4 in F2 .
-
Compute the Cost of Goods Manufactured and Cost of Goods Sold for South Marine Company for the most recent year using the amounts described next. Assume that Raw Materials Inventory contains only...
-
Consider the gasoline mileage data in Table B.3. a. Fit a multiple linear regression model relatmg gasoline mileage $y$ (miles per gallon) to engine displacement $x_{1}$ and the number of carburetor...
-
Marcia Young earns $25 per hour for up to 400 units of production per day. If she produces more than 400 units per day, she will receive an additional piece rate of $0.50 per unit. Assume that her...
-
Required Information [The following Information applies to the questions displayed below.] Warnerwoods Company uses a perpetual Inventory system. It entered into the following purchases and sales...
-
For the spring assemblages shown in Figures P2-8 through P2-16, determine the nodal displacements, the forces in each element, and the reactions. Use the direct stiffness method for all problems....
-
- Important feature of skin absorption. Which of these options is correct: a) Hydrophilicity b) Lipophilicity c) None is correct -The degree of ionization affects. Which of these options is correct:...
-
Which of the following fee structures most likely decreases the volatility of a portfolios net returns? A. Incentive fees only B. Management fees only C. Neither incentive fees nor management fees
-
An investor should prefer a pooled investment vehicle to a separately managed account when she: A. is cost sensitive. B. focuses on tax efficiency. C. requires clear legal ownership of assets.
-
Determine the most appropriate equity-related hedge fund strategy that Shaindy should employ. Justify your response. Jane Shaindy is the chief investment officer of a large pension fund. The pension...
-
Which of the following investment types is the most liquid? A. ETFs B. Hedge funds C. Private equity funds
-
Describe how the conditional linear factor model can be used to address Shaindys concern. During a monthly board meeting, Shaindy discusses her updated market forecast for equity markets. Due to a...
-
Consider the dichromate ion(Cr 2 O 7 -2 ). It has no metal tometal nor oxygen to oxygen bonds. Draw a Lewis structure for thedichromate ion. Consider chromium to have six valenceelectrons.
-
The Cholesterol Level data sets give cholesterol levels of heart attack patients. Cholesterol measures are taken 2, 4, and 14 days aft er a patient has suffered a heart attack. Is there a significant...
-
List and describe the contents of the system specification.
-
Find a questionnaire on the Web that has been created to capture customer information. Describe the purpose of the survey, the way questions are worded, and how the questions have been organized. How...
-
What is the most popular kind of database today? Provide three examples of products that are based on this technology.
-
Find the probability of an IQ less than 85.
-
If 25 women are randomly selected, find the probability that the mean of their red blood cell counts is less than 4.444. Assume that red blood cell counts of women are normally distributed with a...
-
Mensa International calls itself the international high IQ society, and it has more than 100,000 members. Mensa states that candidates for membership of Mensa must achieve a score at or above the...
Study smarter with the SolutionInn App