CSCI 479 -- Machine Learning
Spring 2026 - Assignment 1
Submit deadline: 10:00, 30 January 2026, Friday
A project team collected a set of data that will be used
to build a decision tree model to predict "PREF_CHANNEL" based on
a set of descriptive attributes that are defined below:
- AGE: the customer's age
- GENDER: the customer's gender (male or female)
- LOC: the customer's location (rural or urban)
- OCC: the customer's occupation
- CAR_INS: whether the customer holds a car insurance policy
with the company (yes or no)
- HEALTH_INS: whether the customer holds a health insurance policy
with the company (yes or no)
- HEALTH_TYPE: the type of the health insurance policy
(PlanA, PlanB, or PlanC)
- PREF_CHANNEL: the customer's preferred contact channel
(email, phone, or sms)
The full data set contains 5,200 instances and can be viewed in
this CSV file - A1-data.csv.
Your tasks:
- Pre-process the data
and transform the "AGE" attribute to a categorical
one using one of the algorithms/programs you developed
in Lab 1 and Lab 2.
- Perform a preliminary analysis about the given data set.
(Hint: to do this task, you can 1) use Excel and its functions,
or 2) load the data to a relational database table, using SQLite for example,
and use SQL query to get the result, or 3) write your own program
to analyze the data.)
- Choose one of the following two ways to build a decision tree:
- Design and implement an information based learning algorithm to build
a decision tree using the given data set. You can perform further
data transformation based on your program's requirement.
- Choose a machine learning platform, further pre-process the given
data set so that it can be used by your chosen platform, then
use the resulted data set to build a decision tree.
- Regardless how your decision tree is built, translate it to
a set of equivalent rules.
- Write a document that explains the whole process of building
your predictive model. This document should include at least the
following sections:
- Explain how the "AGE" attribute is transformed from continuous type
to categorical one. Describe the discretization algorithm used
and include its implementation as an appendix file.
The value found by your program to perform the discritization
should be presented in the document.
- Provide the preliminary analysis result about the given data set,
including at least the value range/domain and distribution
of each attribute.
- If you implemented an algorithm to build the decision tree,
include the source code file(s) and their makefile as appendix files.
If you used a platform to build the tree, provide an introduction
about the platform and describe the steps involved in building
the tree.
- Regardless how the decision tree is built, justify why
the attribute selected for the root of the tree is the "best" one.
- Regardless how the decision tree is built, describe or draw
the resulting decision tree.
- Present the set of the rules translated from your decision tree.
- Explain any deficiencies in your decision tree and what caused them.
What to Submit:
Submit your document and any appendix file(s).
How to submit:
Choose one of the following two ways to submit your work:
- Login to your VIU Learn account,
find the CSCI 479 course page, click on the "Assessment"
drop-down menu, click on the "Assignments" item, then click on
the folder named "A1". Then you can click on the "Add a File"
button to browse and upload your document and appendix files.
- On csci server, in the directory that holds all of your assignment
solution files, enter the command
~liuh/bin/submit 479 A1 .
This submit script currently accepts files with the names
of *.pdf, *.txt, *.csv, *.h, *.cpp, makefile.
If you need to submit any file with different extension names,
please contact your instructor before submitting.
Last updated: January 14, 2026