CSCI 479 -- Machine Learning
Spring 2026 - Assignment 5
Submit deadline: 10:00, 3 April 2026, Friday
Problem Description:
In the csv file A5-Points.csv, each line
represents one data point in a three dimensional feature space, where ID
is an artificial id of the data point, and A, B, C represent the data
item's position in the three dimensions respectively. There are
altogether 75 data points collected in the file.
These data points don't have class labels. But it is necessary
to classify these data points into three classes and
to assign a target class label to each data point.
Your tasks:
Write a program that implements K-Means algorithm to cluster
the data points given in the file into three clusters.
The basic K-Means algorithm is shown below:
Select K points as the initial centroids;
repeat
Form K clusters by assigning all points to the closest centroid;
Recompute the centroid of each cluster;
until the centroids don't change
Your program should declare the K as a constant
and set K's value as 3 for this application specifically.
Design and implement a distance function to be used for this application.
Just in case your application runs too long to find stable centroids,
you can consider the following alternative termination conditions:
- set a constant N to an appropriately large number (for example 20),
and terminate your program if N repetitions are done;
- set a relatively small number (against the total number
of data points) P (for example 6), and terminate your program
if less than or equal to P data points changed their cluster
association in the last repetition.
At the end of the execution, your program should output
the coordinates (values of A, B and C) of the three final centroids
and the updated data set (including
the original data points and their corresponding assigned class labels)
to a file called result.txt. The format of the data in result.txt
should be user friendly, that is, easy to read and interpret.
You can choose your own class label representations,
or simply use C1, C2 and C3 as the three class labels.
Lastly, write a document that explains at least the following
things:
- How do you decide to select the initial 3 centroids?
Why do you think your initial centroids selection works
the best for this application?
- What is your distance function? Why do you think that your
distance function is suitable for this application?
- What is the termination condition of your K-Means algorithm
implementation? Why do you think that your termination
condition is appropriate and sufficient?
- How can your implemented K-Means program be executed on
csci server?
- What are coordinates of the three centroids found by
your program? How many numbers of data points
are clustered into each cluster respectively?
- Attach the report.txt as an appendix file to show
the learned class labels for all the original data points.
- any thing you'd like to bring to the attention regarding
your application.
What to Submit:
- The document;
- The source code file of the K-Means algorithm;
- The output file report.txt;
- Makefile if one should be used to automate the process
of compile your program(s).
How to submit:
Choose one of the following two ways to submit your work:
- Login to your VIU Learn account,
find the CSCI 479 course page, click on the "Assessment"
drop-down menu, click on the "Assignments" item, then click on
the folder named "A5". Then you can click on the "Add a File"
button to browse and upload your document and other files.
- On csci server, in the directory that holds all of your assignment
solution files, enter the command
~liuh/bin/submit 479 A5 .
This submit script currently accepts files with the names
of *.pdf, *.txt, *.csv, *.h, *.cpp, *.py, makefile.
If you need to submit any file with different extension names,
please contact your instructor before submitting.
Last updated: March 19, 2026