Question: MapReduce is a widely-used programming model for data processing for modern applications such as business analytics and deep learning. A powerful aspect of the MapReduce

MapReduce is a widely-used programming model for data processing for modern applications such as business analytics and deep learning. A powerful aspect of

MapReduce is a widely-used programming model for data processing for modern applications such as business analytics and deep learning. A powerful aspect of the MapReduce model is that it can distribute information across many computers and this achieves scale. This is desirable, because to solve a bigger problem, all you have to do is add more computers. In this question, we will try to learn some of the basics of the MapReduce model, especially how it distributes data between different computers. Consider the following graph, where nodes A...G represent different computers. For this question, assume each of the nodes is a process (created with our friend, the fork() call). Data is sent to A, which then distributes, i.e., maps in the MapReduce vocabulary, the data to the nodes in the next stage, here (B, C, D). Each of the nodes then processes the incoming data, and further gathers, i.e., reduces, the data to the next stage, here represented by nodes (E, F). In real MapReduce systems, this reduction may include sorting of data etc., but you can assume it to be a simple processing step for the purposes of this problem. Finally, the last node (G) combines and merges all of the data and outputs the results. This completes the processing cycle. Do note that any of the stages (B, C, D) or (E, F) could be scaled by adding more nodes to it. Supporting such scaling is however not required for this question. 1. (12 pts) You will write a program mapreduce.c that implements the basic data flow rep- resented by the above figure using Unix pipes. Your program acting as the parent will fork B D E F processes that represent nodes A... G. You will create pipes to enable communication between the processes as shown by the arrows, e.g., main process to A, A-B, A-C, A-D, etc. Once this is done, your main process will send to A via IPC a sequence of numbers 0,1,2,...,99. A, upon receiving these numbers, will add its name (A) to the front (left) of the number and pass it to (B, C, D) in a round robin fashion. For example, 0 will become A0 and then sent to B, 1 goes to C as A1, and 2 to D as A2, and so on. The receiving nodes (B, C, D) will prepend their own names to the input and pass in on to the next node in the flow graph shown in the figure. For simplicity, nodes with multiple input sources (E, F, G) process data sequentially from their sources, starting from the left to the right as shown in the diagram. E.g, E will read all of the data from B and then from C. Similarly, F will read data first from C and then from D. Consider the example of data item 0: It goes from 0 to A0 to BAO to EBAO to GEBAO. All downward flow will use round-robin distribution where needed. Finally, G will print all of the output data to stdout, one data item on each line. MapReduce is a widely-used programming model for data processing for modern applications such as business analytics and deep learning. A powerful aspect of the MapReduce model is that it can distribute information across many computers and this achieves scale. This is desirable, because to solve a bigger problem, all you have to do is add more computers. In this question, we will try to learn some of the basics of the MapReduce model, especially how it distributes data between different computers. Consider the following graph, where nodes A...G represent different computers. For this question, assume each of the nodes is a process (created with our friend, the fork() call). Data is sent to A, which then distributes, i.e., maps in the MapReduce vocabulary, the data to the nodes in the next stage, here (B, C, D). Each of the nodes then processes the incoming data, and further gathers, i.e., reduces, the data to the next stage, here represented by nodes (E, F). In real MapReduce systems, this reduction may include sorting of data etc., but you can assume it to be a simple processing step for the purposes of this problem. Finally, the last node (G) combines and merges all of the data and outputs the results. This completes the processing cycle. Do note that any of the stages (B, C, D) or (E, F) could be scaled by adding more nodes to it. Supporting such scaling is however not required for this question. 1. (12 pts) You will write a program mapreduce.c that implements the basic data flow rep- resented by the above figure using Unix pipes. Your program acting as the parent will fork B D E F processes that represent nodes A... G. You will create pipes to enable communication between the processes as shown by the arrows, e.g., main process to A, A-B, A-C, A-D, etc. Once this is done, your main process will send to A via IPC a sequence of numbers 0,1,2,...,99. A, upon receiving these numbers, will add its name (A) to the front (left) of the number and pass it to (B, C, D) in a round robin fashion. For example, 0 will become A0 and then sent to B, 1 goes to C as A1, and 2 to D as A2, and so on. The receiving nodes (B, C, D) will prepend their own names to the input and pass in on to the next node in the flow graph shown in the figure. For simplicity, nodes with multiple input sources (E, F, G) process data sequentially from their sources, starting from the left to the right as shown in the diagram. E.g, E will read all of the data from B and then from C. Similarly, F will read data first from C and then from D. Consider the example of data item 0: It goes from 0 to A0 to BAO to EBAO to GEBAO. All downward flow will use round-robin distribution where needed. Finally, G will print all of the output data to stdout, one data item on each line

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Journal of Open Innovation: Technology, Market, and Complexity MDPI Article Emerging Technology and Business Model Innovation: The Case of Artificial Intelligence Jaehun Lee 1.", Taewon Suh , Daniel...

According to the authors of "Business Intelligence and Analytics: From Big Data to Big Impact", what the emerging analytics research opportunities are? Please focus on one opportunity to discuss....

We are increasingly seeing new trends in application of emerging technologies, such as blockchain, audit analytics and continuous auditing, artificial intelligence and others in the public sector....

I want you to summerize these 7 items. !Please be different from other answers! !Please get a little quick! Thanks. You should summarize the 7 items in the photo. Max 125 words! reading 1....

can i get alittle help with this? its practice questions for my Final. 16 Financial Highlights International Business Machines Corporation and Subsidiary Companies ($ in millions except per share...

You should summarize the 7 items in the photos. Max.250 words! !Different answer another chegg answer please! 1. Introduction currently missing from the literature (Trioman et al. 2010: De mirkan and...

Explain how GPU and CPUs are jointly used for self-driving cars? As CPU makers can technically replace the position of Nvidia in the future, the expansion of complementary businesses is integral to...

subject: Differential Equations pls read instructions do not use ai. drop all references and link Instructions ODE application. - find an article related to ODE application - provide a short...

1. Pick an Application from an application domain. Any of the Case Studies discussed in class sessions, except for the Coupon Exchange / Marketplace, can be chosen; or an application from the domain...

Use the Smith chart to find Zin of the feed line shown in Fig. 2-44 (P2.48 (a)). All lines are lossless with Z0 = 50 . Z1 = (50 +j50) 2 0.32- 0.32- Zm -0.7A- Z2 = (50 - j50) 2

a) A fund manager has been monitoring the performance of Virgin Galactic Corporation shares (NYSE: SPCE). The shares are currently trading at $34.8 on the New York stock exchange. The fund manager...

Why is it important to track and manage indirect costs in cost management? A . Indirect costs are typically higher than direct costs B . Indirect costs can be easily avoided C . Indirect costs can be...

Closing inventory information for Priory Gallery is displayed below. Complete the table by calculating the NRV of each item and comparing this with the cost in order to calculate the total value of...

4. When verbally attacked, I allow for the probability that the attack is prompted by pain or fear.

3. When verbally attacked, I allow for the likelihood that the attackers might never have learned how to respond when their needs arent met.

5. When verbally attacked, I keep my role as a manager separate from my identity as a person.