Question: please answer( C and D only) as soon as possible thank you Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each

please answer( C and D only) as soon as possible

thank you

Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each node in the cluster has a total of 2 Terabyte hard disk space and 2 Gigabyte of main memory available. The cluster uses a block- size of 64 Megabytes (MB) and a replication factor of 3. The master maintains 100 bytes of metadata for each 64MB block (a) Let's upload the file wiki_dump.xml (with a size of 600 Megabytes) to HDFS. Explain what effect this upload has on the number of occupied HDFS blocks. (b) Figure 1 shows an excerpt of wiki_dump.xml's structure. Explain the relationship between an HDFS block, an InputSplit and a record based on this excerpt. 80.2 MB } 0.6 MB Figure 1: Excerpt of wiki_dump.xml. Each Wikipedia page is stored within an element. The element with id EN3234 contains 80.2 Megabytes of textual content. (c) You are the only user of the cluster and write a Hadoop job to extract information from wiki_dump.xml. You want to speed up the job by testing different block size configuration: besides the existing 64 MB configuration, you also consider 32 MB and 128 MB block sizes. Which configuration do you think will lead to the fastest job execution? Explain why. (d) Let us assume no files are currently stored on HDFS. You are given 100 million files, each one with a size of 100 Kilobytes. How many of those can you upload successfully to the cluster, considering the storage restrictions (memory/disk) on the NameNode and the DataNodes? Explain your answer. Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each node in the cluster has a total of 2 Terabyte hard disk space and 2 Gigabyte of main memory available. The cluster uses a block- size of 64 Megabytes (MB) and a replication factor of 3. The master maintains 100 bytes of metadata for each 64MB block (a) Let's upload the file wiki_dump.xml (with a size of 600 Megabytes) to HDFS. Explain what effect this upload has on the number of occupied HDFS blocks. (b) Figure 1 shows an excerpt of wiki_dump.xml's structure. Explain the relationship between an HDFS block, an InputSplit and a record based on this excerpt. 80.2 MB } 0.6 MB Figure 1: Excerpt of wiki_dump.xml. Each Wikipedia page is stored within an element. The element with id EN3234 contains 80.2 Megabytes of textual content. (c) You are the only user of the cluster and write a Hadoop job to extract information from wiki_dump.xml. You want to speed up the job by testing different block size configuration: besides the existing 64 MB configuration, you also consider 32 MB and 128 MB block sizes. Which configuration do you think will lead to the fastest job execution? Explain why. (d) Let us assume no files are currently stored on HDFS. You are given 100 million files, each one with a size of 100 Kilobytes. How many of those can you upload successfully to the cluster, considering the storage restrictions (memory/disk) on the NameNode and the DataNodes? Explain your

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

I have provided all of the information that was given to me. there is no excel sheet provided, as the question requires me to create an excel model. please help me and let me know where I can help or...

Please help me out with a C program code for the following problem as soon and possible Thank you!! 1 Overview Write a C program to implement a cellular automaton simulation of fire spreading through...

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

I would highly appreciate your kind answers to the questions includedin the attached case study which was provided by my financial accouting professor. thank you. ISSUES IN ACCOUNTING EDUCATION Vol....

I need a 10 page paper for my MIS class. Please do not copy and paste as my school is getting stricter on plagiarism. I have attached the assignment and the sample \fData Analytic Thinking 1 Data...

I need help on the Boulder Public School Case Study . There are three questions asked at the end of the case study. HBSP Product Number TCG239. Rev. Oct. 2015 THE CRIMSON PRESS CURRICULUM CENTER THE...

1. Which of the following represents the core drives of the information age? A. Data, Information, Business Intelligence, Knowledge. B. Fact, Data, Intelligence, Experience. C. Fact, Intelligence,...

Based upon the following data, estimate the cost of ending merchandise inventory: Sales (net) ............... $1,500,000 Estimated gross profit rate .......... 35% Beginning merchandise inventory...

The freezing point of 0.109 m aqueous formic acid is 0.210°C. Formic acid, HCHO2, is partially dissociated according to the equation HCHO-(aq) H+(aq) + CHO,-(aq)

Consider the following table, whish gives a security analyst's expected return ce two slocks for two particular market returns: \ table [ [ Market Return,Ageresive Sack,Defensive stock ] , [ 5 % , -...

On 1 March 2007 DB Limited issued R560 000 15% debentures at R98. The debentures were to be redeemed at par in four equal annual payments starting 28 February 2010. Required: Journalise the above...