CSCI 479 -- Exam Preparaton
Midterm: (Time: 8:30 - 9:50, February 24th (Tuesday);
Location: Building 200/Room 105)
It's a closed book exam. However, you are allowed to bring
one piece of letter sized and double sided notes to the exam.
The midterm is based on:
Topics included:
- Introduction
- Data
- Why is it important to understand the data collection?
- Attribute types (and later, how they affect the learning algorithms)
- Common issues and how they are handled: missing values, noisy data
- Data aggregation, reduction and transformation
- Information Based Learning
- product: decision tree
- how to build it: recursive algorithm, greedy algorithm
- information in "information based": entropy, GINI, mis-classification
rate
- how to use it
- issues (and how to handle them)
- bad split situation (use information gain ratio instead of
information gain)
- overfitting (pre-pruning, post-pruning)
- continuous descriptive attribute (discretization - equal
width, equal depth, information based)
- continuous target attribute (regression tree - using variance
instead of entropy)
- model ensembles (decision forest instead of tree)
- Similarity Based Learning
- feature space (data format)
- similarity (distance) metrics
- how to use it: nearest neighbor algorithm
- issues (and how to handle them)
- noisy data (K nearest neighbor algorithm)
- feature difference (normalization)
- continuous target attribute (average instead of class label)
- too many descriptive attributes (dimension reduction)
- improve efficiency - indexing (such as K-D tree), pre-screening
- Probability Based Learning
- some statistics concepts
- how to get the probabilities
- Bayes' Theorem
- how to use probability based classifier in general?
- Naive Bayes' classifier, and its BIG assumption
- Bayesian Belief Networks
- what does it look like?
- Markov blanket
- how to use it?
- how to build it?
- issues (and how to handle them)
- insufficient data (smoothing)
- continuous descriptive attribute (probability density function
instead of a fixed set of probabilities)
- missing parent attribute values - hidden variables
(consider all possible values and their probabilities)
Past exams:
Midterm of Fall 2022
Final Exam of Fall 2022