Question: Problem 1 (20 points) Python code (normal Python and not pyspark) to answer following question. Social computing research at the university of Minnesota has released

Problem 1 (20 points) Python code (normal Python and not pyspark) to answer following question. "Social computing research at the university of Minnesota" has released moving rating data sets at different sizes at "gouplens.org" web site. Load MovieLens 10M dataset, which consists of 10million movie ratings. You can down load the data by going to grouplens.org, and under the "datasets" tab, upload "movieLens 10M dataset" it is 63 MB.

a) Divide the data to 5 almost equal size files and use the five files in the rest of the assignment (2 points)

b) Sort the data from the highest rating movie to the lowest one. Measure how much time sorting takes. (6 points) Don't use sort function, and write the sort function yourself. Use sort function

c). Create histogram of the movie ratings. Measure how much time it takes to create the histogram. (2 points)

d). Data contains more than 10M ratings of 10681 movies by 71567 users. Create histogram of number of times each movie got rated. Measure how much time it takes to create the histogram. (4 points)

e). Choose the lowest three bins of histogram in part C and create a histogram of movie ratings for these three bins. Do the same thing for the top three bins of the histogram. (6 points)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!