CSCI 479 -- Machine Learning
Spring 2026 - Assignment 2
Submit deadline: 10:00, 13 February 2026, Friday
This assignment uses the same data format used by Netflix Prize
which is a competition supported by Netflix about 10 years ago
when Netflix wanted to have the best recommendation model it could get.
Between the training data set
and the testng data set,
there are altogether 12,672 ratings that 1000 users (the first 990
users are in the training data set and the last 10 users are in the
testing data set) provided on 100 movies in twenty days.
Each data item is a quadruplet of the form
<user, movie, date of grade, grade>.
The user and movie fields are integer IDs of the user and the
movie respectively, the date of grade takes the format of
"yyyy-mm-dd", and grades are from 1 to 5 (integral and inclusive) stars.
Note that you should only use these numbers as references.
You should explore the data sets to get a more accurate understanding
of the data.
Your tasks:
Overall, your task is to build a similarity based model to
recommend movie(s) to the 10 users in the testing data set.
Specifically, you need to:
- design and implement a program to transform the data set from its
current format to a traditional vector format where one user is one vector.
You don't have to use this recommended user vector format.
You can propose your own data format if 1) your own data format
can be used consistently in your similarity function
and recommendation algorithm; and 2) you can justify
that your recommendation system using your own data format
performs at least
not too worse compared with the traditional vector data format
and its associated recommendation algorithms.
- Design and implement a similarity (or distance) function
that calculates the distance between any two user objects.
(The fist two parts are the same tasks as described in Lab 3.)
- design and implement the K Nearest Neighbors algorithm
to make recommendations to a user, using
the 10 users in the testing data set as test cases.
What to Submit:
Submit a document that explains the whole process of building
and using your recommendation system. This document should
include at least the following sections:
- Provide a data exploration report that should include
at least the following information: the number of
unique users in the training data set and testing data set
respectively; the number of unique movies in both data set;
whether there are any movie that appeared in the training
data set but not in the testing data set and vice versa;
the minumum, maximum and average number of movies graded
by an individual user.
- Describe the format of your transformed data and the reason
why you choose this format. Include the source code and
a sample of the transformed data (to illustrate the
data format after the transformation) as appendix.
- Describe the function used in your recommendation system
to calculate the distance between two user objects
and the reason why you choose this measurement of similarity.
Include the source code or the math formula of the similarity
function as appendix.
- Describe the K nearest neighbors (KNN) algorithm used
in your recommendation system, especially the value of K
chosen and your reason why this particular value of K is chosen.
(Hint: K can be chosen by domain knowledge or through
many experiments.)
- Describe the recommendation criteria used in your program
and the reason why you choose this particular criteria;
- Present the recommendation result (recommended movie ID(s))
for each of the 10 users in the test data set.
Include the source code for the recommendation as appendix.
- Include, as an appendix, the Makefile (if there is one)
that can be used to automate the compilation and preferrably
also excution of your recommendation system.
- As a thought experiment, explain whether and how your recommendation
algorithm should be modified if some or all the data in testing
data set are also included in the training data set.
How to submit:
Choose one of the following two ways to submit your work:
- Login to your VIU Learn account,
find the CSCI 479 course page, click on the "Assessment"
drop-down menu, click on the "Assignments" item, then click on
the folder named "A2". Then you can click on the "Add a File"
button to browse and upload your document and appendix files.
- On csci server, in the directory that holds all of your assignment
solution files, enter the command
~liuh/bin/submit 479 A2 .
This submit script currently accepts files with the names
of *.pdf, *.txt, *.csv, *.h, *.cpp, *.py, makefile.
If you need to submit any file with different extension names,
please contact your instructor before submitting.
Last updated: January 27, 2026