CSCI 161 Spring 2005: ADTs

CSCI 161 Spring 2005: Abstract Data Types (ADTs)

Overview

Since many software projects are extremely complex, one of the major problems a software designeer faces is abstraction: the ability to consider a block of information in "just enough" detail to solve the immediate problem.

An abstraction may be defined as "a model of a complex system that includes only the details essential to the perspective of the viewer of the system" ^{(N Dale, C++ Data Structures, 1999)}.

There are two major forms of abstraction:

control abstraction - in which we encapsulate and give a logical name to a sequence of actions (e.g. taking a series of C++ statements and grouping them into a function)
data abstraction - in which we group a set of data items into a single logical entity (e.g. creating a "Student Record" data type, as a structure which includes a student's name, id number, grade point average, etc.)

This semester (and in many of your subsequent computing courses) we will consider logical entities which include both control and data abstractions: abstract data types, or ADTs.

Data abstraction and implementation hiding

An ADT includes all the data associated with a particular logical data entity, plus all the fundamental operations which manipulate that type of data entity.

For example, suppose we are maintaining an online dictionary.

The ADT representing the dictionary might represent the dictionary as a list of entries, with each entry composed of a spelling, a pronunciation, a definition, and an example of use.
The operations associated with the ADT might be:
- create a new (empty) dictionary
- look up the entry for a particular word
- update the entry associated with a particular word
- create and add an entirely new entry
- delete an outdated entry

We will also consider access to the ADT in two parts:

A public interface, and
A private implementation

The public interface gives the following:

The name of each of the ADT operations a programmer might need in order to use the ADT, along with the return type (if any), and the parameter type(s) and order.
A "black box" description of each ADT operation. This description precisely defines all the user-visible behaviour of the operation. I.e. it describes what the operation does for the user (though not how it does it), including error handling and any assumptions about the parameters or pre-requisites to using the operation.
For example, suppose we have a routine to generate random numbers. Here is a potential interface and black box description:
```
int GenRandom(int lower, int upper)
// The GenRandom routine generates and returns a random integer
//     in the range lower..upper (inclusive)
// It assumes that both supplied parameters are positive integers,
//     and will return 0 if either value is less than 1
// It also assumes that the random number generator has been
//     previously initialized by a call to routine InitializeRandGen
```

Obviously the designer(s) and implementor(s) of the ADT need to know both the interface and the implementation, but other programmers who merely wish to include the ADT as part of their programs only need to know how to use the public ADT interface, they do not need to know the implementation details.

This is important in two respects:

The job of the "using" programmer is greatly simplified, since they only need to study the public interface in order to make use of the other code
The designer/implementor can enforce correct use of their ADT by only allowing other programmers access to the public interface.
This is achieved in C++ (and Java) through the use of "public" and "private" keywords, and means that the designer/implementor is free to change their implementation at any time - as long as the original interface is still followed the users will not notice any functional change!
This vastly improves the maintainability of software.

Code maintenance and reuse

It is estimated that, worldwide, between 50% and 70% of all time and money spent on software is spent on maintenance and reuse:

IT IS FAR MORE COMMON TO WORK ON UPDATING OR FIXING EXISTING CODE THAN TO WORK ON DEVELOPING NEW CODE FROM SCRATCH

This is simply because once a substantial effort has been made to develop a software product, it is highly desirable to maintain that code rather than develop another new product.

Any time we can make use of existing code, rather than developing new code, we are hopefully saving both time and money.

It is also hoped that we will have fewer debugging problems, since a substantial effort will have already gone into testing and debugging the original code.

The use of ADTs and interfaces enhances both the maintainability and reusability of code, since we have clearly defined data entities and operations which can be reused with minimal effort (the using programmer just needs to understand the interface) and which can be modified without impacting other programs (as long as the interface is not modified).

One important note here: IF THE INTERFACE IS ALTERED THEN ALL OTHER PROGRAMS CAN BE AFFECTED: GET YOUR INTERFACE RIGHT AT THE BEGINNING, OR LIVE WITH THE CONSEQUENCES.

ADT descriptions

Again, the abstract data type must provide sufficient information for the ADT to be fully and correctly used by any other designer/implementor.

Some of the ADTs we look at in 161 include Queues, Stacks, Lists, and Trees.