In Java , I need help with a MapReduce program called RatingDistribution that will print the 5
Question:
In Java, I need help with a MapReduce program called RatingDistribution that will print the 5 different rating distributions for movies. The program should work on MovieLens data set, which can be downloaded from http://www.grouplens.org/node/73. Download the 100k data (ml-100k.zip) set into your Downloads folder. On running the program, the program should process the u.data file, which contains a list of movie ratings by users. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This file is a tab separated list of user id | item id | rating | timestamp. The program should take 2 arguments. The first argument should be the directory where the movielens data set file is placed in HDFS. Please go ahead and place the file in /movie-ratings in hdfs. The second argument is the name of directory where the results will be placed in HDFS (called /movie-rating-distribution). The result of the program should be five records, one for each rating. Each output record should have rating number, tab separation, and the count as the format. Before you begin, please split the input file into 5 files called u1.data, u2.data, u3.data, u4.data and u5.data,called split that allows you to split a file easily. Reference - https://kb.iu.edu/d/afar
Data Mining Concepts And Techniques
ISBN: 9780128117613
4th Edition
Authors: Jiawei Han, Jian Pei, Hanghang Tong