Problem Description:
There are at least two ways to store/maintain training dataset
for the purpose of using Naive Bayes Classifier:
- Store the raw training data items. The advantage is that add a new
training data item is extreamly easy. The disadvantage is that
the conditional probability will needed to be calculated from counting
(traverse every data item) when a query needs to be classified (time
inefficient) and it needs a lot of storage space to store all the
raw data (space inefficient).
- Store the conditinal probability table. The advanteage is that
query classification can be done very efficiently. The disadvantage
is that adding a new training data item becomes very difficult.
Your tasks:
Using the data set CarData.csv
used in assignment 3 as an example,
design a data structure and related algorithms
to make it possible that a program can both maintain incrementally added
training data items and perform the classification task
using Naive Bayes Classifier in time and space efficient ways.