Question: Describe how you would solve the following problem with MapReduce. Problem: The input is a big file that contains information about many houses. Each house

  1. Describe how you would solve the following problem with MapReduce.

Problem: The input is a big file that contains information about many houses. Each house is represented by one line in the file: (address, city, state, zip, value). The final output should be the average house value in each zip code.

  1. You should explain how the input is mapped into (key, value) pairs in the map stage, i.e., specify what is the key and what is the associated value in each pair, and, if needed, how the key(s) and values(s) are obtained.

  1. You need to mention how the shuffle process is conducted.

Shuffle pairs with the same zip to the same reducer

  1. You should also explain how the (key,value) pairs produced by the map stage are processed by the reduce stage to get the final answers

Generate the average house value for a zip code by summing up the values and dividing it by the count

You may draw a figure with some simple examples (as the word count application in slides).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!