Question: please answer( C and D only) as soon as possible thank you Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each

please answer( C and D only) as soon as possible
thank you please answer( C and D only) as soon as possiblethank you Consider

Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each node in the cluster has a total of 2 Terabyte hard disk space and 2 Gigabyte of main memory available. The cluster uses a block- size of 64 Megabytes (MB) and a replication factor of 3. The master maintains 100 bytes of metadata for each 64MB block (a) Let's upload the file wiki_dump.xml (with a size of 600 Megabytes) to HDFS. Explain what effect this upload has on the number of occupied HDFS blocks. (b) Figure 1 shows an excerpt of wiki_dump.xml's structure. Explain the relationship between an HDFS block, an InputSplit and a record based on this excerpt. 80.2 MB } 0.6 MB Figure 1: Excerpt of wiki_dump.xml. Each Wikipedia page is stored within an element. The element with id EN3234 contains 80.2 Megabytes of textual content. (c) You are the only user of the cluster and write a Hadoop job to extract information from wiki_dump.xml. You want to speed up the job by testing different block size configuration: besides the existing 64 MB configuration, you also consider 32 MB and 128 MB block sizes. Which configuration do you think will lead to the fastest job execution? Explain why. (d) Let us assume no files are currently stored on HDFS. You are given 100 million files, each one with a size of 100 Kilobytes. How many of those can you upload successfully to the cluster, considering the storage restrictions (memory/disk) on the NameNode and the DataNodes? Explain your answer. Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each node in the cluster has a total of 2 Terabyte hard disk space and 2 Gigabyte of main memory available. The cluster uses a block- size of 64 Megabytes (MB) and a replication factor of 3. The master maintains 100 bytes of metadata for each 64MB block (a) Let's upload the file wiki_dump.xml (with a size of 600 Megabytes) to HDFS. Explain what effect this upload has on the number of occupied HDFS blocks. (b) Figure 1 shows an excerpt of wiki_dump.xml's structure. Explain the relationship between an HDFS block, an InputSplit and a record based on this excerpt. 80.2 MB } 0.6 MB Figure 1: Excerpt of wiki_dump.xml. Each Wikipedia page is stored within an element. The element with id EN3234 contains 80.2 Megabytes of textual content. (c) You are the only user of the cluster and write a Hadoop job to extract information from wiki_dump.xml. You want to speed up the job by testing different block size configuration: besides the existing 64 MB configuration, you also consider 32 MB and 128 MB block sizes. Which configuration do you think will lead to the fastest job execution? Explain why. (d) Let us assume no files are currently stored on HDFS. You are given 100 million files, each one with a size of 100 Kilobytes. How many of those can you upload successfully to the cluster, considering the storage restrictions (memory/disk) on the NameNode and the DataNodes? Explain your

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!