Question: In Java , I need help with a MapReduce program called RatingDistribution that will print the 5 different rating distributions for movies. The program should

In Java, I need help with a MapReduce program called RatingDistribution that will print the 5 different rating distributions for movies. The program should work on MovieLens data set, which can be downloaded from http://www.grouplens.org/node/73. Download the 100k data (ml-100k.zip) set into your Downloads folder. On running the program, the program should process the u.data file, which contains a list of movie ratings by users. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This file is a tab separated list of user id | item id | rating | timestamp. The program should take 2 arguments. The first argument should be the directory where the movielens data set file is placed in HDFS. Please go ahead and place the file in /movie-ratings in hdfs. The second argument is the name of directory where the results will be placed in HDFS (called /movie-rating-distribution). The result of the program should be five records, one for each rating. Each output record should have rating number, tab separation, and the count as the format. Before you begin, please split the input file into 5 files called u1.data, u2.data, u3.data, u4.data and u5.data,called split that allows you to split a file easily. Reference - https://kb.iu.edu/d/afar

Step by Step Solution

3.39 Rating (146 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Heres a MapReduce program in Java called RatingDistribution that achieves the described functionality import javaioIOException import javautilStringTo... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!