CSCI 479 (Machine Learning)

Problem Description:

There are at least two ways to store/maintain training dataset for the purpose of using Naive Bayes Classifier:

Store the raw training data items. The advantage is that add a new training data item is extreamly easy. The disadvantage is that the conditional probability will needed to be calculated from counting (traverse every data item) when a query needs to be classified (time inefficient) and it needs a lot of storage space to store all the raw data (space inefficient).
Store the conditinal probability table. The advanteage is that query classification can be done very efficiently. The disadvantage is that adding a new training data item becomes very difficult.

Your tasks:

Using the data set CarData.csv used in assignment 3 as an example, design a data structure and related algorithms to make it possible that a program can both maintain incrementally added training data items and perform the classification task using Naive Bayes Classifier in time and space efficient ways.