Question: CS 3 2 5 : Artificial Intelligence Homework On Matrix factorization In this homework, you are going to find the optimum K value for Matrix

CS 325: Artificial Intelligence
Homework On Matrix factorization
In this homework, you are going to find the optimum K value
for Matrix Factorization based prediction model on a real-life
dataset. K value can range from 1 to 942. You can start with 1
and increment it by as your next K value.
Dataset: You will use the MovieLens 100k dataset, which is a
stable benchmark dataset with 100,000 ratings given by 943
users for 1682 movies, with each user having rated at least 20
movies. This dataset consists of many files that contain
information about the movies, the users, and the ratings given by
users to the movies they have watched.
u.item: Information about the items (movies); this is a tab
separated list of
movie id | movie title | release date | video release date
I
IMDb URL | unknown | Action | Adventure |
Animation
Children's | Comedy | Crime | Documentary | Drama |
Fantasy
Film-Noir | Horror | Musical | Mystery | Romance |
Sci-Fi |||
Thriller | War | Western |
The last 19 fields are the genres, a 1 indicates the movie is of
that genre, a 0 indicates it is not; movies can be in several genres
at once. The movie ids are the ones used in the u.data data set.
u.data: The full rating data set, 100000 ratings by 943 users on
1682 items. Each user has rated at least 20 movies. Users and
items are numbered consecutively from 1. The data is randomly
ordered. This is a tab separated list of user id | item id | rating |
timestamp. The time stamps are Unix seconds since 1/1/1970
UTC
u.user: Demographic information about the users; this is a tab
separated list of user id ||| age | gender | occupation | zip code. The
user ids are the ones used in the u.data data set. Also, read the
'read.txt' for more information on the dataset.
Go through the u.data file to build the rating matrix (user-
move-rating data). Now, from the rating matrix build the
training matrix. As each user has rated at least 20 movies,
select 5 movies randomly for each user which the user rated, and
put them in the test data file. The training matrix will contain all
the movie ratings except those that are in the test data file. The
movies in the test file, you have to make them unrated. For
example: if user 1 rated movie 1 to 20 in the rating matrix, then
if test data file contains movie rating of 1 to 5 by user 1, then
training matrix file will contain movie rating of 6 to 20 by user 1
as rated and 1 to 5 as unrated. In summary, the test data file will
contain 5 movie ratings for each user, and the training matrix
will contain all the movie rating information except those in the
test data for all users.
Now, use the matrix factorization algorithm to predict the
missing values in the training matrix. And test with different
K values and choose the best one with minimum error value.
What to submit: Submit a zip file of all your code that can
be executed and excel file with different K value and the
corresponding error value.
CS 3 2 5 : Artificial Intelligence Homework On

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!