write the code for Wordcount.java to display the output for Average rating and the number of user
Fantastic news! We've Found the answer you've been seeking!
Question:
write the code for Wordcount.java to display the output for Average rating and the number of user who rated the movie, the u.data set
Transcribed Image Text:
The following file is from Movielens dataset which shows user ratings for movies: http://files.grouplens.org/datasets/movielens/ml-100k/u.data You can find more about this dataset here: https://files.grouplens.org/datasets/movielens/ml-100k-README.txt u.data is the full u data set with 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file 95 546 2 879196566 Is interpreted as follows: User 95 has rated movie 546, 2/5 (rates are in the range 1-5) at time 879196566 (Monday, November 10, 1997 9:16:06 PM, GMT). Your task is to use MapReduce programming and find the following information for each movie: the average rating and the number of users who rated this movie. Here is an example of the output: Movie ID 340 499 Average Rating 3.78 Number of Users Rated 298 4.02 532 You can choose the output format. However, the required information must be included in the output. Hint: You can change the WordCount program such that it ignores all tokens in a line except the third one (rating value in the file exists in the third column). 2 The following file is from Movielens dataset which shows user ratings for movies: http://files.grouplens.org/datasets/movielens/ml-100k/u.data You can find more about this dataset here: https://files.grouplens.org/datasets/movielens/ml-100k-README.txt u.data is the full u data set with 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file 95 546 2 879196566 Is interpreted as follows: User 95 has rated movie 546, 2/5 (rates are in the range 1-5) at time 879196566 (Monday, November 10, 1997 9:16:06 PM, GMT). Your task is to use MapReduce programming and find the following information for each movie: the average rating and the number of users who rated this movie. Here is an example of the output: Movie ID 340 499 Average Rating 3.78 Number of Users Rated 298 4.02 532 You can choose the output format. However, the required information must be included in the output. Hint: You can change the WordCount program such that it ignores all tokens in a line except the third one (rating value in the file exists in the third column). 2
Expert Answer:
Answer rating: 100% (QA)
Sure heres a basic implementation of the WordCount program in Java for finding the average rating and the number of users who rated each movie in the ... View the full answer
Related Book For
An Introduction to Management Science Quantitative Approach to Decision Making
ISBN: 978-1337406529
15th edition
Authors: David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran
Posted Date:
Students also viewed these programming questions
-
(a) Sets containing integers can be represented as int list values. Consider two such representations called unordered and ordered. In the former elements can appear in any order; in the latter...
-
Hello, I'm having some trouble with this Java project. I cannot figure out why the text files member.txt and register.txt are not being updated and saved when i run the program. Methods...
-
Solve the following general system by inverting the coefficient matrix and using Theorem 1.6.2. x1 + 2x2 + 3x3 = b1 x1 - x2 + x3 = b2 x1 + x2 = b3 (a) b1= - 1, b2 = 3, b3 = 4 (b) b1 = - 1, b2 = -1,...
-
Which nation had the best macro performance in 2004-2014?
-
Simplify and check using a graphing calculator 2 sinx cosx COS X 2 sin x 2
-
Andrew Reitz established a trust in 2000, naming his sons, James and John, as sole beneficiaries and himself as trustee. Upon Andrews death, Hal Rachal Jr., the attorney who drafted the trust, became...
-
Dallas Industries has adopted the following production budget for the first 4 months of 2013. Each unit requires 3 pounds of raw materials costing $2 per pound. On December 31, 2012, the ending raw...
-
1.Libby just expanded her restaurant. She projects revenue will reach $35,000 for the new restaurant in the first year and increase by 25% over the next three years. Expenses are 75% of sales. The...
-
Use the data in COUNTYMURDERS to answer this question. Use only the year 1996. The variable murders is the number of murders reported in the county. The variable execs is the number of executions...
-
James Choi, David Laibson, and Brigitte Madrian conducted an experiment to study the choices made in fund selection. Suppose 100 undergraduate students and 100 MBA students were selected. When...
-
Today is the first day of your summer internship at a cryptocurrency startup. Before you join,the marketing team decided that they want to differentiate their product by emphasizing on security. On...
-
Consider the following code: int a = 4; int b = 6; int c = 9; int* p = &a; int* q = p; p = &b; a++; (*q) ++; b = *q* 2; c = *q + *p; What are the final values of a,b,c,p, and q?
-
Let C Fn 2 be a non-empty binary code. Define the following quantities: R+(C) = max x2Fn 2 min c2C d(x, c), R(C) = min x2Fn 2 max c2C d(x, c). Prove that R+(C) + R(C) = n. Let C be an [n, k, d] code...
-
What are three reasons that more land is used for pasture than for crop production? Briefly explain.
-
In what ways does the problem of self-deception resemble the problem of childhood fever back in Semmelweis' day? Explain
-
Find the particular solution of the inhomogeneous equation a)y" + 2y' + 3y = 1 + xe* b)y" + y = sinx + xcosx c)y" + y' +y = sin?x d)y" y' 6y = 2sin3x
-
Assessing simultaneous changes in CVP relationships Braun Corporation sells hammocks; variable costs are $75 each, and the hammocks are sold for $125 each. Braun incurs $240,000 of fixed operating...
-
South Central Airlines (SCA) operates a commuter flight between Atlanta and Charlotte. The regional jet holds 50 passengers, and currently SCA only books up to 50 reservations. Past data show that...
-
Davison Electronics manufactures two models of LCD televisions, identified as model A and model B. Each model has its lowest possible production cost when produced on Davisons new production line....
-
Assume that the project in Problem 3 has the following activity times (in months): a. Find the critical path. b. The project must be completed in 1.5 years. Do you anticipate difficulty in meeting...
-
What are the implications of behavioral finance?
-
Suppose that you are a trader at the stock market. T-Mobiles stocks currently trade at $45 and the expected return is 9%. You have information that leads you to believe that by the end of year the...
-
You are considering purchasing a 10-year bond and follow the theory of rational expectations. If you have just read the annual report of the central bank in your country that states interest rates...
Study smarter with the SolutionInn App