Question: What should be the key and value for mapper's input? What should mapper output? What should be the key and value for reducer's input? What

What should be the key and value for mapper's input? What should mapper output? What should be the key and value for reducer's input? What should reducer output?

In this problem, you need to write a MapReduce program to find the top-N pairs of similar users from a repository called Movielens. Provided is the dataset of Movielens 20M, which can be found in http://grouplens.org/datasets/movielens/20m/. The original range of the rating scores is in [0.5,5.0]. For simplicity, we ignore the missing values in the ratings. You need to calculate the top-N pairs of similar users based on the ratings. In order to calculate the similarity, we redefine the rating score in [0.5,2.5] as unlike", and the rating score in (2.5,5.0] aslike. Thus you can preprocess the dataset as (user, movie, like') or (user, movie, 'unlike'). Given a user i, you can construct a set of movies with 'like' (denoted by Li), and also a set of movies with unlike' (denoted by Ui). Then, for a pair of users of (i,j), you can calculate the similarity between them via the following metric as J accard = (Liu Lj) U (Ui U Uj) You output the top-N pairs of similar users based on the values of Jaccard. (Hint: You can leave the final sorting step to a UNIX utility called "sor". Your MapReduce program needs only to calculate the metric of Jaccard.) Please write a map reduce program (one mapper and one reducer) to list the top-100 pairs of similar users with similarity (from most similarity to least similarity) Each line should have a pair of users' ID and the similarity. A valid example is as follows: "2" "3" 0.75 In this problem, you need to write a MapReduce program to find the top-N pairs of similar users from a repository called Movielens. Provided is the dataset of Movielens 20M, which can be found in http://grouplens.org/datasets/movielens/20m/. The original range of the rating scores is in [0.5,5.0]. For simplicity, we ignore the missing values in the ratings. You need to calculate the top-N pairs of similar users based on the ratings. In order to calculate the similarity, we redefine the rating score in [0.5,2.5] as unlike", and the rating score in (2.5,5.0] aslike. Thus you can preprocess the dataset as (user, movie, like') or (user, movie, 'unlike'). Given a user i, you can construct a set of movies with 'like' (denoted by Li), and also a set of movies with unlike' (denoted by Ui). Then, for a pair of users of (i,j), you can calculate the similarity between them via the following metric as J accard = (Liu Lj) U (Ui U Uj) You output the top-N pairs of similar users based on the values of Jaccard. (Hint: You can leave the final sorting step to a UNIX utility called "sor". Your MapReduce program needs only to calculate the metric of Jaccard.) Please write a map reduce program (one mapper and one reducer) to list the top-100 pairs of similar users with similarity (from most similarity to least similarity) Each line should have a pair of users' ID and the similarity. A valid example is as follows: "2" "3" 0.75

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

5.16 LAB: Cryptographic Hashing Algorithms Encryption methods, such as the Caesar Cipher encryption, allow us to encrypt and decrypt text using a special key. Another method of encrypting text /...

Overview: The program will create a Servlet that will respond to HTML requests with text streams. The servlet will have locally instantiated Expert System object. The website will allow a user to see...

Can someone help me crate a GUI for my Java code? package expertsystem; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File; import java.io.FileReader; import...

This assignment has two parts: 1) implementing a tree map (Unbalanced TreeMap) based on the unbalanced binary search tree discussed in class, 2) and comparing the efficiency of the implemented map...

Already constructed the UnbalancedTreeMap just stuck with the Analysis. -- Step 2 This is only one question using java, please do not avoid. This assignment has two parts: 1) implementing a tree map...

USE JAVA!!! This assignment has two parts: 1) implementing a tree map (UnbalancedTreeMap) based on the unbalanced binary search tree discussed in class, 2) and comparing the efficiency of the...

I was able to do the class OrderedKeyVakue but I'm still confused with the BST. To clarify, the keys are words that we have in different files, so we need to tokenize the words in the files in order...

1 Purpose MapReduce [1, 2] is a programming model that allows processing on large datasets using two functions: map and reduce. It allows automatic parallelization of computation across multiple...

These questions are related to the PageRank algorithm as well as Hadoop or Hadoop like things. Problem 5. (PageRank) The math equation of one page for PageRank is: r(Page;) = S 4 (Page) (1-) pageln...

Question: MapReduce You are a data scientist working with the United States Internal Revenue Service. The IRS maintains a registry of all United States individual taxpayers. For each taxpayer, the...

7. Bernie Larson is the manager of a New York style delicatessen. Bernie's boss has established performance targets that determine the amount of bonus Bernie will receive each month. Bemie's...

Would the result in this case have been different if the parties contract to build and operate a wireless network had been negotiated and agreed to entirely online? Discuss. The term wireless network...

Mint Cleaning incorporation prepared the following and adjusted trial balance at the end of the second year of operation sending December 3 1

The rationing function of prices refers to the Multiple Choice fact that ration coupons are needed to alleviate wartime shortages of goods. ability of the market system to generate an equitable...

Why do HCMSs exist? Do they change over time?

Suppose the price of oil falls sharply (as it did in 1986 and again in 1998). a. Show the impact of such a change in both the aggregate-demand/aggregate-supply diagram and in the Phillips-curve...

When did the shift from Text-based Business Application Software to GUI-based Applications begin?