The problem scenario of this assignment is from our optional textbook, Fundamentals of Machine Learning for Predictive Data Analytics, with substantial modification.
The European Space Agency wants to build a model to predict the amount of oxygen that an astronaut consumes when performing five minutes of intense physical work. The descriptive features for the model will be the age of the astronaut and their average heart rate throughout the work.
The regression model is:
OXYCON = w[0] + w[1] * HEALTH + w[2] * AGE + w[3] * HEARTRATEThe table below shows a historical dataset that has been collected for this task:
| ID | HEALTH | AGE | HEARTRATE | OXYCON |
| 1 | Very Good | 41 | 138 | 37.99 |
| 2 | Good | 42 | 153 | 47.34 |
| 3 | Good | 37 | 151 | 44.38 |
| 4 | Okay | 34 | 133 | 28.17 |
| 5 | Okay | 48 | 126 | 27.07 |
| 6 | Okay | 44 | 145 | 37.85 |
| 7 | Good | 43 | 158 | 44.72 |
| 8 | Very Good | 46 | 143 | 36.42 |
| 9 | Good | 37 | 138 | 31.21 |
| 10 | Good | 38 | 158 | 54.85 |
| 11 | Very Good | 43 | 143 | 39.84 |
| 12 | Okay | 43 | 138 | 30.83 |
Here is a copy of the same data in csv format:
A4-data.csv.
Your tasks:
First, it's obvious that the attribute representing health status
(HEALTH) is a categorical one and can't be used directly
as a variable in the regression model. You need to design
a scheme to transform this categorical attribute to
a proper type that's suitable to be used in the model.
You can manually edit the input file, use the
transformed value to replace the HEALTH value, and
feed the modified input file to your program.
Then, write an error based learning program with your choice of programming language to tune the weights in the above given multivariate linear regression model.
Specifically, your program can set the following (adjustable) constants:
w[0] = -59.5, w[1] = 5.5, w[2] = -0.15, and w[3] = 0.60;
The steps of your program should perform in one iteration are:
Repeat the above steps until either the designated iteration number is reached, or the calculated model error is below the given acceptable threshold.
for each iteration, display the model error and the adjusted weights of the model, in an easy to understand format. At the end of the iterations (end of your program), display the original data with an added column that shows your model's prediction.
Lastly, write a document that explains at least the following things:
Choose one of the following two ways to submit your work: