Question: The map-reduce framework is quite useful for creating inverted indices on a set of documents. An inverted index stores for each word a list of

The map-reduce framework is quite useful for creating inverted indices on a set of documents. An inverted index stores for each word a list of all document IDs that it appears in (offsets in the documents are also normally stored, but we shall ignore them in this question)

For example, if the input document IDs and contents are as follows:

1: data clean
2: database
3: clean base

then the inverted lists would

data: 1, 2
clean: 1, 3
base: 2, 3

Give pseudocode for map and reduce functions to create inverted indices on a given set of files (each file is a document). Assume the document ID is available using a function context.getDocumentID(), and the map function is invoked once per line of the document. The output inverted list for each word should be a list of document IDs separated by commas. The document IDs are normally sorted, but for the purpose of this question, you do not need to bother to sort them.

Step by Step Solution

3.58 Rating (159 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Step 1 Word Count Pseudocode for word counting mapString key String value key document name value do... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!