CSCI 479: Exam Preparation

CSCI 479 -- Exam Preparaton

Midterm: (Time: 8:30 - 9:50, February 24th (Tuesday); Location: Building 200/Room 105)

It's a closed book exam. However, you are allowed to bring one piece of letter sized and double sided notes to the exam.

The midterm is based on:

Topics included:

Introduction
Data
- Why is it important to understand the data collection?
- Attribute types (and later, how they affect the learning algorithms)
- Common issues and how they are handled: missing values, noisy data
- Data aggregation, reduction and transformation
Information Based Learning
- product: decision tree
- how to build it: recursive algorithm, greedy algorithm
- information in "information based": entropy, GINI, mis-classification rate
- how to use it
- issues (and how to handle them)
  - bad split situation (use information gain ratio instead of information gain)
  - overfitting (pre-pruning, post-pruning)
  - continuous descriptive attribute (discretization - equal width, equal depth, information based)
  - continuous target attribute (regression tree - using variance instead of entropy)
- model ensembles (decision forest instead of tree)
Similarity Based Learning
- feature space (data format)
- similarity (distance) metrics
- how to use it: nearest neighbor algorithm
- issues (and how to handle them)
  - noisy data (K nearest neighbor algorithm)
  - feature difference (normalization)
  - continuous target attribute (average instead of class label)
  - too many descriptive attributes (dimension reduction)
- improve efficiency - indexing (such as K-D tree), pre-screening
Probability Based Learning
- some statistics concepts
- how to get the probabilities
- Bayes' Theorem
- how to use probability based classifier in general?
- Naive Bayes' classifier, and its BIG assumption
- Bayesian Belief Networks
  - what does it look like?
  - Markov blanket
  - how to use it?
  - how to build it?
- issues (and how to handle them)
  - insufficient data (smoothing)
  - continuous descriptive attribute (probability density function instead of a fixed set of probabilities)
  - missing parent attribute values - hidden variables (consider all possible values and their probabilities)

Past exams:

Midterm of Fall 2022

Final Exam of Fall 2022