Programming languages: lecture notes

Data abstraction: ADTs

To quote Sebesta, "Abstraction is a weapon against the complexity of programming, its purpose is to simplify the programming process."

There are two kinds of abstraction we are concerned with:

Process abstraction: in which multiple computation sequences are grouped into a single parameterized computation block:
this was considered in the previous sections on control structures and subprograms,
Data abstraction:
in which different data entities can be logically grouped by their set of common attributes, and distinguished by their differing attributes:
For example, we can group all stacks together as a data abstraction with common operations (pop, push, etc), and only address their differences (such as the data types which can be stored on the stack) when and where appropriate.

Both process and data abstraction contribute to our ability to modularize code:

when programs become sufficiently large, it is difficult if not impossible for the average programmer to organize and orchestrate all the program details simultaneously
if simpler models can be found to describe the behavior of parts of the program, and implemented in a manner consistent with both the model and correct software behavior, then the level of detail the programmer needs to consider at any given point in time is reduced
abstractions can lead to greater maintainability, in that implementation details are encapsulated within the relevant abstraction, and can be altered without impacting the rest of the software
abstractions (both process and data) also facilitate code reuse - if a logical abstraction is found it may be suitable for use in more than just the original implementation instance

An abstract data type is a group of program units and data items, that only includes the data representation for one specific type of data and the subprograms that provide the operations for manipulating that data type.

the implementation of the actual floating point values is hidden
the floating point operators are visible for manipulation of floating point values, but the implementation of the operations is hidden

The distinction between this an the traditional view of ADTs is that the data type interface is provided in the high level language, while the language implementation and underlying hardware and software conceal the ADT implementation details

One of the key design issues in programming languages is determining the level of support the language supplies for user-defined abstract data types.

Object-oriented languages are a natural extension of the ADT concept.

Formally, an abstract data type satisfies the following conditions:

The definition of both the data type and the operations on that type are contained in a single syntactic unit, and other program units can create variables (instances) of the defined type
The representation of objects of the data type is hidden from the program units which use the type, with the result that the program units can only directly manipulate the objects by using the defined operations.

The program units which use objects of the defined type are called clients of the type

ADTs provide all the benefits discussed earlier for abstractions:

decreased levels of detailed knowledge are required by programmers working on the system
code maintainability is enhanced
code reused is facilitated

ADTs also provide increased reliability by enforcing the use of defined operations to manipulate the data types: programmers are not free to access the data types directly, they can only be used in accordance with the access processes put in place by the ADT designer

Note that in most languages the programmer has the ability to simulate the use of ADTs, but not the ability to enforce them.

In C, for instance, the programmer may create a data type and a set of access routines which - if used - give the desired behavior. However, it is very difficult to prevent programmers from circumventing the access routines if they so desire.

Thus, under our formal definition, C has poor support for ADTs.

ADT Design issues:

Encapsulation: what facilities are provided for defining the ADT, for declaring variables of that type, and for calling the defined operations on that type?
Information hiding: how much of the underlying implementation is visible to the clients, and how much is hidden?
In particular this must address the visibility of both the implementation details for the ADT's operations and the underlying data types used to implement the current ADT.
What generic operations should be supported across all ADTs?
Operations to create and destroy instances are likely candidates (carrying out appropriate memory allocation and deallocation, as well as any appropriate initialization?)
Tests for equality/inequality are also reasonable possibilities.
Should there be restrictions on the ability to create ADTs?
Should there be access controls available to somehow which clients have access to the ADT, or parts thereof?

ADTs in C++

C++ supports abstract data types through its class construct.

A class is defined with associated operations, or member functions, and data fields, or data members.

Once a class is defined and a name assigned to the class type, variables can be declared to be instances of that type.

For example, if a stack class is defined, then variables firststack and secondstack might be declared as two separate stack instances.

In implementation terms:

the code for the member functions is stored in memory once, and accessed or shared by all instances of the class
(e.g. the code for function push is not duplicated for each individual stack)
the data members for the class, on the other hand, are created separately for each instance (e.g. each stack gets its own set of data fields)
when a variable is declared to be of a class type (e.g. a stack variable) that variable is stack dynamic
when a new operator is used to explicitly create a class instance during execution then the created instance is heap-dynamic, as with any other data type
(e.g. a variable has been declared as a pointer to a class instance, and during execution new is used to construct the class instance and assign its address to the pointer)
(Of course, the data members within the class could contain variables which are used for heap-dynamic allocation also)

Example: a declaration for an array-based C++ stack of integers:

class stack {
   public:
       stack();                  // constructor: initializes stack
       ~stack();                 // destructor: cleans terminated stack
       void push(int n);         // push n onto stack
       void pop();               // remove top item from stack
       int top();                // return value of top item on stack
       int isempty();            // return 1 if stack empty, 0 otherwise
       int isfull();             // return 1 if stack full, 0 otherwise
       int getsize();            // return current number of elements on stack
   private:
       int  stackarray[MAXSIZE]; // stack contents
       int  elements;            // number of items currently on stack
};

stack::stack() 
// body of the stack constructor
{
   elements = 0;
   for (int i = 0; i < MAXSIZE; i++)
       stackarray[i] = 0;
}

stack::~stack()
// body of the stack destructor
{
   // no real cleanup necessary
}

void stack::push(int n)
// body of the stack push routine
{
   if (isfull() == 0) {
      stackarray[elements++] = n;
   }
}

// etc with the rest of the stack member functions

// Declaring and using stacks
void main()
{
   stack s1, s2;              // s1 and s2 are instances of stacks
 
   s1.push(1);                // push value 1 onto stack 1
   cout << s1.top() << endl;  // print the top value of stack 1
   s2.push(s1.top());         // push (a copy of) the top value 
                              //    from stack 1 onto stack 2
   // etc ...
}

Note the following declaration and usage features for C++ classes:

the member functions and data can be partitioned into private and public sections: the public material is accessible to clients, the private material is only accessible to member functions of the class instance
reference to the member functions is achieved by specifying the variable name, the . operator, and the member function name and parameters
the name of the class type is also the name of the constructor function, and the name of the class type preceded by the ~ symbol is the name of the destructor
(also note that neither have return types, and neither uses the return statement)

only the member function headers are declared within the class definition, the function bodies can be declared separately using the stack name and the :: symbols to specify the class association

inline functions

For the stack example, suppose we replaced all the member functions with inline functions:

class stack {
   public:
       stack() {
          elements = 0;
          for (int i = 0; i < MAXSIZE; i++) stackarray[i] = 0;
       }
       ~stack() { }
       void push(int n) {
          if (isfull() == 0) stackarray[elements++] = n;
       }
       void pop() {
          if (isempty() == 0) elements--;
       }
       int top() {
          if (isempty() == 0) return(stackarray[elements-1]);
       }
       int isempty() {
          if (elements == 0) return(1);
          else return(0);
       }
       int isfull() {
          if (elements == MAXSIZE) return(1);
          else return(0);
       }
       int getsize() {
          return(elements);
       }
   private:
       int  stackarray[MAXSIZE]; // stack contents
       int  elements;            // number of items currently on stack
};

Allowing clients access to private data is sometimes desirable for the sake of efficiency (although this does detract from the "purity" of an ADT)
In C++ this is achieved by the use of friend functions.
Suppose we have a client function that can be implemented much more effectively given access to a class' private data, e.g. a peek function for the stacks considered above.
Within the declaration of our stack class, we must explicitly list peek as a friend function:
```
class stack {
   friend int peek(int d);

   // ... and the rest of the usual stack declaration
};

// ...

int peek(int d)
// peek at the d'th element from the top of the stack
{
  // body of peek function
}
```

C++ vs Java:

While Java and C++ are closely related in terms of ADTs, there are a few differences:

ALL user-defined data types in Java are declared using classes
ALL functions in Java are declared as member functions within classes
All objects in Java are allocated from the heap and accessed through reference variables (remember the absence of pointers in Java)
And, since garbage collection is carried out implicitly in Java, there are no destructors
public and private are modifiers for individual variable and member function declarations in Java
Java supplies an extra level "above" classes for encapsulation: packages
A package can contain multiple related classes, and these classes can access variables and member functions from one another as long as they have public or protected modifiers, or no modifier at all.
Thus this has many of the same benefits of the friend functions of C++ (which are not supported in Java)

Parameterized ADTs in C++

Earlier we discussed generic functions in C++, using template functions to handle a variety of possible data types for parameters.

Templates are also useful in creating parameterized ADTs - for example stacks where a generic data type is used for the types of elements which can be pushed on the stack.

Here we modify the inline version of our stack as follows:

change our stack of ints to a stack of generic types,
dynamically allocating our stack array in the constructor,
deallocate the array in the destructor.

template <class Type>
class stack {
   public:
       stack() {
          elements = 0;
          stackarray_ptr = new Type [MAXSIZE];
          for (int i = 0; i < MAXSIZE; i++) stackarray_ptr[i] = 0;
       }
       ~stack() { 
          delete stackarray_ptr;
       }
       void push(int n) {
          if (isfull() == 0) stackarray_ptr[elements++] = n;
       }
       void pop() {
          if (isempty() == 0) elements--;
       }
       int top() {
          if (isempty() == 0) return(stackarray[elements-1]);
       }
       int isempty() {
          if (elements == 0) return(1);
          else return(0);
       }
       int isfull() {
          if (elements == MAXSIZE) return(1);
          else return(0);
       }
       int getsize() {
          return(elements);
       }
   private:
       Type *stackarray_ptr;     // stack contents
       int  elements;            // number of items currently on stack
};

Object-oriented programming

The goal of object-oriented programming is to model all systems as collections of objects which communicate by message passing.

This is ideally suited to modelling, simulating, or controlling real-world systems which can be regarded as sets of communicating entities, each with its own internal processes that are invoked as a result of communications with the outside world.

The central point to object-oriented programming languages is the ability to take abstractions of abstract data types:

the commonality of similar ADTs is extracted and used as a base type, which the variants can build upon through the concept of inheritance

True object oriented languages are (in a formal definition) required to support three key features:

Abstract data types
Inheritance
Dynamic binding (of messages to methods)

ADTs were discussed in the previous section, so we now focus on the concepts of inheritance and dynamic binding.

Inheritance

Inheritance is the process by which one class can be created as a special category of another class - inheriting (although possibly redefining) the variables and methods of the original class.

One of the key practical goals of including inheritance in a language is to improve the rate of software reuse - if a "reasonable" set of underlying classes is defined then much of the work of creating specialized instances is reduced.

TERMINOLOGY:

As with ADTs, most languages supporting OO programming do so through the use of classes, with instances of a class referred to as objects.
When one class is based upon another through the use of inheritance, the original class is referred to as the parent class or superclass, while the new class is the derived class or subclass.
When a derived class modifies some of the methods inherited from the parent class, it is said to override the inherited version (and the method is referred to as an overridden method)
The member functions of the classes are called methods, while the actual calls to these functions are called messages, and the collection of methods is called the message interface, or message protocol.
When a class inherits from a single parent class the process is called single inheritance, while inheriting from multiple parents is referred to as multiple inheritance
Access to data and methods is usally restricted by categorizing them as public, private, or protected (the latter being used to provide access to only some classes)
most methods operate only on their associated instance of the class, and most variables belong to their specific instance, not all instances of the class: when the need for clarity arises these are referred to as instance methods and instance variables
However, some languages support the concepts of
- class variables, which are shared amongst the entire class of instances, rather than one specific instance, and
- class methods, which can perform operations on the class (rather than simply objects, or instances of the class)

Polymorphism and dynamic binding

The goal here is to allow parent classes to use variables of (limited?) generic data types, and define methods which act on those variables.

These generic variables should be able to reference any of the subclasses, and the methods may be overridden or customized by those subclasses to handle the different data types appropriately.

When the (generic) variable calls the (overridden) method the call is dynamically bound to the proper method in the proper class.

This facilitates long term development and maintenance of software systems, where all the possible (specific) data types may not be known at the time of initial development.

OO Design issues

Some of the design issues that must be addressed include:

How exclusive or pervasive do you make the object characterization?
For instance, in a pure OO system, ALL data types would be treated as classes - from bits and Booleans through floats and strings and all user defined types.
In such a system there should be no distinction between predefined types and user defined types - they are all classes and all are handled through messages.
This would be the purest method, but loses some of the efficiency one could obtain through allowing hardware manipulation of the most common, simplest data types and operations
Unfortunately, treating the simple scalar data types differently than objects is likely to lead to problems when one begins mixing the use of objects and non-objects (we will return to this discussion when we consider wrapper classes in Java)
If all types are defined using classes, and subclasses can be derived through inheritance, are subclasses subtypes?
Suppose the parent class defines a LIST type, and the derived class defines a SORTEDLIST type.
When evaluating type compatibility, if x is a LIST and y is a SORTEDLIST, under what conditions should they be considered type-compatible?
One possibility is to rule that a derived class is only a subtype (i.e. type compatible) if it only adds variables and methods and overrides inherited methods in "compatible" ways.
I.e. the overriding method can only replace the overridden method in a manner which has no possibility of generating type errors
Since the implementations of classes are intended to be hidden from clients, if a subclass is derived from a parent class should it inherit only the interface to the parent methods, or inherit both the interface and the implementation?
These two versions of inheritance are referred to as (naturally) interface inheritance and implementation inheritance
Implementation inheritance makes the subclass somewhat dependent on the implementation of the parent (i.e. changes to the parent implementation directly impact the subclass).
On the other hand, interface inheritance can cause a loss in efficiency since the subclass cannot directly access the data variables in the same manner as the parent did - it must work through the publicly-available interface.
Does the language support single inheritance or multiple inheritance?
Single inheritance is much simpler to work with in terms of program maintainance and readability - when multiple inheritance is permitted it is possible to create much more complex (and confusing) dependencies between classes.
On the other hand, multiple inheritance allows for more flexible use (and re-use?) of existing classes and combinations thereof.
One of the practical issues with multiple inheritance is that of name collisions: suppose classes A and B each have a field named Initial and class C inherits from both A and B - how should the conflict between the two names be handled?
What allocation and deallocation techniques should be permitted for objects?
Should they only be allocated by the compiler (e.g. stack-dynamic), or only during execution using commands like new (e.g. heap-dynamic), or should both be allowed?
If objects can be heap-dynamic, then is garbage collection explicit or implicit?
How is type checking handled in the presence of dynamic binding?
We specified earlier that when a generic variable in a parent class references a method which is inherited (and possibly overridden) by the subclasses, the binding of the call to the correct subclass method takes place dynamically.
Given that, how do we carry out type checking?
If the language is intended to be strongly typed then type checking needs to be carried out statically, and this significantly restricts the ways in which polymorphic messages and methods can be used.
We need to check the actual vs the formal parameters for the method used, and the actual vs the expected return type.
The nature of the coercions carried out by the language determines our flexibility in the matching of protocols between the methods defined in the parent class and the overrides applied in the derived classes.
(The template classes of C++ provide considerable flexibility while still retaining reasonable type checking compared to pure OO languages such as Smalltalk)
Should the user be able to specify whether static binding can be used where appropriate rather than dynamic binding?
If so, efficiency is improved since the cost of dynamic binding is typically much higher.
(This is supported in C++, where the use of virtual keyword distinguishes the possible need for dynamic binding - and in fact is has been demonstrated that even with dynamic binding in C++ only five more memory references are required than with static binding)

OO in C++

General characteristics:
- Because C++ inherited the underlying structure of C, it has both the imperative style of types found in C and the object-oriented concept of classes.
- Because of this, objects in C++ can be allocated anywhere that variables in C could, i.e. they can be static, stack-dynamic, or heap-dynamic.
- Heap-dynamic allocation is carried out through the new operator, and the lack of garbage collection necessitates the use of the delete operator for deallocation of heap-allocated objects.
- All classes include a constructor operator, called (implicitly or explicitly) when an object of that class is created. (Explicitly for heap-dynamic instances, implicitly for stack-dynamic instances.)
- Most classes also include a destructor, which is implicitly called when the object is deleted, e.g.:
```
class List {
public:
   List();
   ~List();
   ...
private:
   Link *head; // Link is class for list elements
   int  length;
}

List::List() {
    head = NULL;
    length = 0;
}

List::~List() {
   Link *p = head;
   while (p != NULL) {
      Link *pnext = p->next;
      delete p;
      p = pnext;
   }
}

// where destructor might get called
   ...
   List *l;
   ...
   l = new List;
   ...
   delete l;
   ...
```
Inheritance
- C++ classes can be stand-alone (no parent), or can be derived from a parent class through inheritance
- The syntax for a derived class can be either of the following:
```
class SubClassName: public ParentClassName {
    // derived class body
};

class SubClassName: private ParentClassName {
    // derived class body
};
```
  Private vs public derivations determine whether the public/protected methods of the parent class will be passed on to any classes which are subsequently derived from the new subclass.
  For instance, if C is a subclass of B which is a subclass of A, and we use
```
class B: private A {
};
```
  then although B has access to the public/protected methods of A, C will not have such access.
  (Note: methods that were private in A are inherited by B, but not visible in B!)
- The syntax for accessing public values from ancestor classes is to use the ancestor class name, followed by the scope operators ::, followed by the identifier to be accessed
```
class A {
   private:   int x;
   protected: int y;
   public:    int z;
};

class B: private A {
   // x is not visible
   // can access y using A::y
   // can access z using A::z
};

class C: private B {
   // cannot access A::x, A::y, or A::z
};

 Multiple inheritance is used by specifying more than one class in
the declaration, e.g.

class X {
  ...
};

class Y {
  ...
};

class Z: public X, private Y {
};
```
  C++ has no explicit renaming feature.
  Name conflicts are resolved by (the programmer) specifying the name of the parent, e.g. if both X and Y contain a foo function, we refer to them via X::foo() and Y::foo()
Dynamic binding
- Dynamic binding (or dynamic dispatch) in C++ is supported through the use of virtual functions:
  A pure virtual function is defined by the "= 0;" syntax shown below,
  such functions have no body and cannot be called - they must be redefined in the derived classes, as shown below.
  Any class that contains a pure virtual function (such as the shape class below) is said to be an abstract class, and no object of such a class can be created.
  (Note that in the example below a reference to a shape is declared, but the object it eventually references is created from one of the non-abstract classes)
```
class shape {
   public:
      virtual void draw() = 0;   // generic draw function
   ...
}

class circle: public shape {
   public:
      virtual void draw() { ...draw a circle... }
   ...
}

class rectangle: public shape {
   public:
      virtual void draw() { ...draw a rectangle... }
   ...
}

class square: public rectangle {
   public:
      virtual void draw() { ...draw a square... }
   ...
}

square s;              // a square shape
rectangle r;           // a rectangle shape
shape &ref_shape = s;  // a reference to shape s
ref_shape.draw();      // takes ref_shape, a pointer to a "general" shape,
                       //    and dynamically binds to the draw method for
                       //    a square
r.draw();              // can statically bind this call, since r is known
                       //    to be a rectangle at compile time
```
  Note that this format allows us to easily extend our collection of shapes.
  Suppose we wish to add a triangle shape, and all the functions within the base shape class are declared as virtuals.
  Then we need only do the following:
  - derive a class, triangle, from shape
  - implement operations to override the virtual functions (draw, etc) and the code necessary to construct triangle objects
  - place the new code in a separate file, compile it, and relink it with the existing code
  NONE OF THE PREVIOUSLY EXISTING CODE NEEDS TO BE ALTERED
  Once a function is declared as virtual, it is treated as virtual in all the subsequently derived classes
- Multiple inheritance of a common base When the same base class is to be included along multiple inheritance paths, one solution is the use of virtual inheritance of the base (see below):
```
     Person
    /      \
Student  Employee
    \      /
    Teaching
    assistant

class Person: {
   ...
}

class Student: virtual public Person {
   ...
}

class Employee: virtual public Person {
   ...
}

class TeachingAssisant: public Student, public Employee {
   ...
}
```
Implementation: C++ classes are in fact extensions of C structs
The only difference between a C++ struct and a C++ class is that all the members of a struct are by default public.

C++ vs Java OO

Again, Java and C++ are very similar in terms of support for OO, but there are some differences:

in Java only the primitive scalar types (Boolean, character, and numeric types) are not treated as objects
(As a note, since some classes will only operate on objects, it is occasionally necessary to create a wrapper class for the simple types!)
The root class in Java is Object, and ALL other classes must be a derivative of Object or one of its subclasses (i.e. no stand-alone classes other than Object
All Java objects are explicit heap dynamic, usually allocated with new, and all deallocation and garbage collection is implicit (no destructors, no delete)
Java supports only single inheritance, not multiple
Java allows interface definitions: specifying the named constants and method declarations for a class, but nothing else.
This allows for a sort of virtual class
In Java all methods are treated as dynamically bound unless they are explicitly defined to be final, in which case they cannot be overridden and are statically bound.
(Essentially the reverse of the C++ case, where methods are statically bound unless they are virtual functions.)
Java provides packages, which can encapsulate a number of related classes and allow access to protected data between these classes,
This achieves many of the goals of the C++ friend functions, while providing a clearer ADT-based relationship for the sharing of such data.

C++ Generic Stacks Class

template <class Type>
class Stack {
   public:
      Stack(int MaxSize = 100); // create stack with size limit,
      ~Stack() { delete stackptr; } // delete stack space  
      void push(Type& data);    // push data element on stack
      void pop(){ if (size > 0) size--; } // delete top element  
      Type top();               // return copy of top stack element
      bool isempty(){ return(size == 0); } // is stack empty?
      bool isfull(){ return(size == maxsize); } // is stack full?
      int getsize(){ return(size); } // get current stack size
      int getmaxsize(){ return(maxsize); } // get maximum stack size
      Type peek(int depth); // peek into stack at depth from top
   private:
      int size;                 // current stack size
      int maxsize;              // maximum stack size
      Type *stackptr;           // ptr for array of elements
};

template <class Type>
Stack::Stack(int MaxSize) {
   maxsize = MaxSize;
   stackptr = new Type[MaxSize];
   size = 0;
}

template <class Type>
void Stack::push(Type& data) {
   if (size < maxsize)
       stackptr[size++] = data;
}

template <class Type>
Type Stack::peek(int depth) {
   if ((size > depth) && (depth >= 0))
      return(stackptr[size-(depth +1)]);
   else
      throw OutOfBounds(); // throw exception
}

template <class Type>
Type Stack::top() {
   if (size > 0)
      return(stackptr[size-1]);
   else
      throw OutOfBounds(); // throw exception
}

Cleanup: overloading operators in C++

I believe this was overlooked in the notes covered earlier, but the mechanism for actually performing operator overloading in C++ is to declare the operator as you would any other function, but using the operator symbol in place of the function name and preceding the symbol with the operator keyword.

<return type> operator <symbol> (<parameter_list>);

The C++ restrictions on operator overloading are as follows:

You cannot create new operators, just overload existing ones (using the same number of operands as the existing ones).
Furthermore, the following operators cannot be overloaded
- the dot operator .
- the scope resolution operator ::
- the ternary operator ?:
Some operators can only be overloaded as operations, not as functions. These operators are = [] -> ()
The precedence of overloaded operators is the same as the original precedence.
At least one of the parameters for the overloaded operator must be of a class type
The operator can (optionally) be a friend of a class