CSCI 485: Data mining 2012

Instructor: Marina Barsky
Phone: 2321
Office: 315/212
Email: mgbarsky@gmail.com

Books:

  • Main textbook:
    Introduction to Data Mining (First Edition)
    by Pang-Ning Tan, Michael Steinbach, Vipin Kumar.
    Addison Wesley, 2005.

    These sample chapters are available at the publisher's Web site.

  • Explanation and theory for Weka:
    Data Mining: Practical Machine Learning Tools and Techniques
    by Ian H. Witten, Eibe Frank.
    Morgan Kaufmann; 2nd edition, 2005.
    Available at VIU library as an e-book.

  • Commercial applications:
    Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management (Second edition)
    by Michael Berry and Gordon Linoff.
    Wiley, 2004.

  • Cool real-life applications:
    Programming Collective Intelligence (1st edition)
    by Toby Segaran
    O'Reilly, 2007

Course Description

Data Mining is one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of large data collections that potentially contain valuable bits of information hidden in them. Data mining is an invaluable tool for researchers in astronomy, biology and health informatics. Commercial enterprises recognize the value of data mining for customer profiling, retention and financial planning.

Data mining refers to a family of techniques at the intersection of database systems, artificial intelligence and algorithms that efficiently analyse data.

The course requires some statistical background, however all necessary concepts will be introduced in statistics primers. Some programming skills may be helpful but not essential.

The main software is Weka - a set of data mining algorithms implemented in Java. For data sources we are going to use a variety of datasets in such areas as banking, commerce, and sociology.