Question: Design MapReduce algorithms to take a very large file of integers and produce as output: (a) The largest integer. (b) The average of all the
Design MapReduce algorithms to take a very large file of integers and produce as output: (a) The largest integer. (b) The average of all the integers. (c) The same set of integers, but with each integer appearing only once. (d) The count of the number of distinct integers in the input. In the form of relational algebra implemented in SQL, relations are not sets, but bags; that is, tuples are allowed to appear more than once. There are extended definitions of union, intersection, and difference for bags, which we shall define below. Write MapReduce algorithms for computing the following operations on bags R and S: (a) Bag Union, defined to be the bag of tuples in which tuple t appears the sum of the numbers of times it appears in R and S. (6) Bag Intersection, defined to be the bag of ruples in which tuple t appears the minimum of the numbers of times it appears in Rand S. (e) Bag Difference, defined to be the bag of tuples in which the number of times a ruple t appears is equal to the number of times it appears in R minus the number of times it appears in S. Atuple that appears more times in Sthan in R does not appear in the difference
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
