Machine Learning (Spring 2026)

CSCI 479 -- Machine Learning
Spring 2026 - Assignment 2
Submit deadline: 10:00, 13 February 2026, Friday

This assignment uses the same data format used by Netflix Prize which is a competition supported by Netflix about 10 years ago when Netflix wanted to have the best recommendation model it could get.

Between the training data set and the testng data set, there are altogether 12,672 ratings that 1000 users (the first 990 users are in the training data set and the last 10 users are in the testing data set) provided on 100 movies in twenty days. Each data item is a quadruplet of the form <user, movie, date of grade, grade>. The user and movie fields are integer IDs of the user and the movie respectively, the date of grade takes the format of "yyyy-mm-dd", and grades are from 1 to 5 (integral and inclusive) stars.

Note that you should only use these numbers as references. You should explore the data sets to get a more accurate understanding of the data.

Your tasks:

Overall, your task is to build a similarity based model to recommend movie(s) to the 10 users in the testing data set.

Specifically, you need to:

design and implement a program to transform the data set from its current format to a traditional vector format where one user is one vector.
You don't have to use this recommended user vector format. You can propose your own data format if 1) your own data format can be used consistently in your similarity function and recommendation algorithm; and 2) you can justify that your recommendation system using your own data format performs at least not too worse compared with the traditional vector data format and its associated recommendation algorithms.
Design and implement a similarity (or distance) function that calculates the distance between any two user objects.
(The fist two parts are the same tasks as described in Lab 3.)
design and implement the K Nearest Neighbors algorithm to make recommendations to a user, using the 10 users in the testing data set as test cases.

What to Submit:

Submit a document that explains the whole process of building and using your recommendation system. This document should include at least the following sections:

Provide a data exploration report that should include at least the following information: the number of unique users in the training data set and testing data set respectively; the number of unique movies in both data set; whether there are any movie that appeared in the training data set but not in the testing data set and vice versa; the minumum, maximum and average number of movies graded by an individual user.
Describe the format of your transformed data and the reason why you choose this format. Include the source code and a sample of the transformed data (to illustrate the data format after the transformation) as appendix.
Describe the function used in your recommendation system to calculate the distance between two user objects and the reason why you choose this measurement of similarity. Include the source code or the math formula of the similarity function as appendix.
Describe the K nearest neighbors (KNN) algorithm used in your recommendation system, especially the value of K chosen and your reason why this particular value of K is chosen. (Hint: K can be chosen by domain knowledge or through many experiments.)
Describe the recommendation criteria used in your program and the reason why you choose this particular criteria;
Present the recommendation result (recommended movie ID(s)) for each of the 10 users in the test data set. Include the source code for the recommendation as appendix.
Include, as an appendix, the Makefile (if there is one) that can be used to automate the compilation and preferrably also excution of your recommendation system.
As a thought experiment, explain whether and how your recommendation algorithm should be modified if some or all the data in testing data set are also included in the training data set.

How to submit:

Choose one of the following two ways to submit your work:

Login to your VIU Learn account, find the CSCI 479 course page, click on the "Assessment" drop-down menu, click on the "Assignments" item, then click on the folder named "A2". Then you can click on the "Add a File" button to browse and upload your document and appendix files.
On csci server, in the directory that holds all of your assignment solution files, enter the command
~liuh/bin/submit 479 A2 .
This submit script currently accepts files with the names of *.pdf, *.txt, *.csv, *.h, *.cpp, *.py, makefile.
If you need to submit any file with different extension names, please contact your instructor before submitting.

Last updated: January 27, 2026

CSCI 479 -- Machine Learning Spring 2026 - Assignment 2 Submit deadline: 10:00, 13 February 2026, Friday

What to Submit:

How to submit:

CSCI 479 -- Machine Learning
Spring 2026 - Assignment 2
Submit deadline: 10:00, 13 February 2026, Friday