CSCI 479 -- Machine Learning
Spring 2026 - Assignment 4
Submit deadline: 10:00am, 20 March 2026, Friday

Problem Description:

The problem scenario of this assignment is from our optional textbook, Fundamentals of Machine Learning for Predictive Data Analytics, with substantial modification.

The European Space Agency wants to build a model to predict the amount of oxygen that an astronaut consumes when performing five minutes of intense physical work. The descriptive features for the model will be the age of the astronaut and their average heart rate throughout the work.

The regression model is:

OXYCON = w[0] + w[1] * HEALTH + w[2] * AGE + w[3] * HEARTRATE
The table below shows a historical dataset that has been collected for this task:
IDHEALTHAGEHEARTRATEOXYCON
1Very Good4113837.99
2Good4215347.34
3Good3715144.38
4Okay3413328.17
5Okay4812627.07
6Okay4414537.85
7Good4315844.72
8Very Good4614336.42
9Good3713831.21
10Good3815854.85
11Very Good4314339.84
12Okay4313830.83

Here is a copy of the same data in csv format: A4-data.csv.

Your tasks:

First, it's obvious that the attribute representing health status (HEALTH) is a categorical one and can't be used directly as a variable in the regression model. You need to design a scheme to transform this categorical attribute to a proper type that's suitable to be used in the model.
You can manually edit the input file, use the transformed value to replace the HEALTH value, and feed the modified input file to your program.

Then, write an error based learning program with your choice of programming language to tune the weights in the above given multivariate linear regression model.

Specifically, your program can set the following (adjustable) constants:

The steps of your program should perform in one iteration are:

  1. make a prediction for each training instance using the given model with the current weights;
  2. calculate the sum of squared errors for the set of the predictions generated in the previous step as the model error;
  3. adjust the weights based on the calculated model error from the previous step and the given learning rate using the gradient descent algorithm;

Repeat the above steps until either the designated iteration number is reached, or the calculated model error is below the given acceptable threshold.

for each iteration, display the model error and the adjusted weights of the model, in an easy to understand format. At the end of the iterations (end of your program), display the original data with an added column that shows your model's prediction.

Lastly, write a document that explains at least the following things:

What to Submit:

How to submit:

Choose one of the following two ways to submit your work:


Last updated: March 3, 2026