CSCI 479 -- Machine Learning
Spring 2026 - Assignment 5
Submit deadline: 10:00, 3 April 2026, Friday

Problem Description:

In the csv file A5-Points.csv, each line represents one data point in a three dimensional feature space, where ID is an artificial id of the data point, and A, B, C represent the data item's position in the three dimensions respectively. There are altogether 75 data points collected in the file.

These data points don't have class labels. But it is necessary to classify these data points into three classes and to assign a target class label to each data point.

Your tasks:

Write a program that implements K-Means algorithm to cluster the data points given in the file into three clusters.

The basic K-Means algorithm is shown below:

Select K points as the initial centroids;
repeat
    Form K clusters by assigning all points to the closest centroid;
    Recompute the centroid of each cluster;
until the centroids don't change

Your program should declare the K as a constant and set K's value as 3 for this application specifically.

Design and implement a distance function to be used for this application.

Just in case your application runs too long to find stable centroids, you can consider the following alternative termination conditions:

At the end of the execution, your program should output the coordinates (values of A, B and C) of the three final centroids and the updated data set (including the original data points and their corresponding assigned class labels) to a file called result.txt. The format of the data in result.txt should be user friendly, that is, easy to read and interpret. You can choose your own class label representations, or simply use C1, C2 and C3 as the three class labels.

Lastly, write a document that explains at least the following things:

What to Submit:

How to submit:

Choose one of the following two ways to submit your work:


Last updated: March 19, 2026