Programming languages: lecture notes

Subprograms

Subprograms are an example of process abstraction:

A computation sequence is somehow packaged and named, and thereafter the name is used whenever that computation sequence is required.

This allows the programmer to hide details of the computation sequence, and simply refer to it by some logical identifier - used properly this can substantially readability and maintainability.

Basic definitions and characteristics

Initially, we make the following assumptions and definitions:

There is a single entry point to each subprogram
The routine which invokes, or calls the subprogram is the calling routine or caller, and the invoked subprogram is the called routine or callee
The caller is suspended during the execution of the callee
There is only one subprogram in execution at an given time
Control always returns to the caller when the callee terminates
Routines which have been called but which have not yet completed their execution are termed active
The components of a subprogram are:
- definition: the interface and description of the subprogram actions
  A subprogram may be declared as well as defined - the declaration typically being used to define names and types, but not to provide the subprogram bodies
- header: declares the name, type, and (if necessary) parameters of the subprogram
  The paramters supplied in the header are referred to as the formal parameters
- call: the explicit request to execute the subprogram
  The parameters supplied in the call are referred to as the actual parameters
- (parameter) profile: the number, order, and types of the formal parameters
- protocol: the profile plus the return type (if any) of the subprogram

Functions and procedures

The distinction is sometimes made between subprograms which are functions, and those which are procedures.

In the strictest sense, functions should compute and return a value and should not have any observable side effects, while procedures may produce side effects by acting on either non-local variables or on parameters which allow the transfer of data to the caller (e.g. reference parameters).

Parameters

Information passing between caller and callee can be handled in two ways:

through variables visible to both caller and callee
through parameters passed from caller to callee

Using variables to communicate between routines has several problems:

it can limit the ability to perform nested or recursive calls (e.g. if global variables are used to communicate, all the calls are accessing the same set of variables)
it detracts from the readability of the source code: the reader must be aware of where the variables are used and declared, and must be aware of all potential side effects acting on the variables
it prevents the user from "protecting" the communication variables: they must be visible (and open to unintended side effects) at all points in the called routines - this is even more significant in dynamic scoping environments, where variables are visible to an entire "chain" of called routines
in dynamic scoping environments type checking of non-local variable references is not possible - further detracting from program reliability

As a result, the more accepted method for inter-routine communication is by passing parameters.

There are two methods for binding the actual parameters in a call to the formal parameters of a subroutine: keyword and positional.

Positional parameters match the actual parameters to the formal parameters in the order they are listed: i.e. the first actual parameter is matched to the first formal parameter, the second to the second, etc.

Keyword parameters require the programmer to provide, in the function call, both the actual parameter and the name of the formal parameter it is to be matched to.

The advantage of this system is that the programmer doesn't have to remember the order of parameters, the disadvantage is that the programmer does have to know the names of the formal parameters in the called routine.

Parameter numbers and types: as a language designer you must chose whether the types and number of actual parameters will be checked against the number and types of the formal parameters.

Pascal, Java, Fortran 90, Ada etc require type checking of actual vs formal parameters, while Fortran 77 and the original version of C do not.

In ANSI C the programmer can vary the definition format of a subprogram to enable/disable type checking of parameters:

// foo without type checking       // foo with type checking
int foo(x, y)                      int foo(double x, double y)
double x, y;                       {
{                                     ...
  ...                              }
}

In C++ type checking is carried out, with the following exception:

...

While this improves flexibility it clearly detracts from error checking.

Note also that coercion will be applied in C and C++ to make actual types match expected types.

Default values: the designer must also choose whether or not default values can be supplied to parameters.

For example, a valid C++ function with default values is

float calculate_taxes(float income = 0.0, float taxrate = 0.25)
{
   return(income * taxrate);
}

The function would have the following results:

cout << calculate_taxes();  // prints out 0
cout << calculate_taxes(100);  // prints out 25
cout << calculate_taxes(100,0.5);  // prints out 50

Return types: deciding which types of values can be returned by a function is another design issue.

Fortran and Pascal only allow unstructured types to be returned,
C allows all types except functions and arrays,
C++ follows the C conventions, except that it also allows the return of classes
Ada allows all types

Parameter passing methods: there are five main passing conventions:

Pass-by-value:
- The value of the actual parameter is used to initialize the formal parameter, which in turn acts as a local variable in the subprogram.
- Typically this is achieved by copying the parameter value to the formal parameter, resulting in long initialization times if the parameter is a large data structure.
- This also requires evaluation of any expressions used in the actual parameter, which can also increase execution time.
- Pass-by-value is used by default in C, C++, Java, and Pascal
Pass-by-result:
- This is used to communicate data from the called subprogram to the caller.
- The caller supplies a variable to be used for returning data, and at the end of the called routine the value of the formal parameter is copied into the designated variable.
- Again, if the result is a large data structure then substantial execution time may be required to carry out the data transfer.
Pass-by-value-result:
- This is essentially a combination of pass-by-value and pass-by-result.
- This is used for two-way communication: the value from a caller-supplied variable is copied to the formal parameter, and at the end of the routine the (possibly altered) value of the formal parameter is copied back to the variable.
Pass-by-reference:
- This is also used for two-way communication of data, but rather than copying data back and forth it simply communicates an access path for the supplied variable.
- Often this is simply the address of the variable - giving the subprogram the ability to read/write the variable's contents directly.
- This avoids the necessity to copy data back and forth, and hence can reduce execution time, but it requires indirect access to the variable contents (via whatever form the access path takes) and hence can increase execution time.
- Pass-by-reference is available in C++ (reference variables), Java (objects), and Pascal (var parameters)
  Pass-by-reference is simulated in C by using pointers to variables for indirect addressing
Pass-by-name:
- Pass-by-name achieves greater programming flexibility by allowing the user to pass a name as an actual parameter, and that name is substituted for the formal parameter throughout the body of the subprogram for that invokation.
- The form of the actual parameter can thus cause completely different behaviors during execution, for instance:
  - if the actual parameter is an expression then effectively that expression is substituted for the formal parameter throughout the subprogram body
  - if the actual parameter is an identifier then a like-named entity can be referenced within the subprogram body, as long as the identifier can be matched to something within the referencing environment of the subprogram
  Consider the following examples assuming pass-by-name:
```
int MyArray[10];

int foo(NamedVar) {
    int x = 3;

    NamedVar = 7;            
    return (NamedVar * 17);  
}

void main()
{
   int x = 0;
   cout << foo(MyArray[x]);
           // assigns 7 to global MyArray[3] 
           // prints 119

   cout << foo(x);
           // assigns 7 to local var x in foo
           // prints 119
}
```
- This form of parameter passing effectively enables some aspects of polymorphism (changing the functionality of a subprogram based on its operand types)
- Pass-by-name was introducedin Algol 60, and supported by Simula 67

Subprograms as parameters: it is sometimes useful to pass functions or procedures as parameters to other subprograms, which may then invoke them.

This introduces a number of design and implementation issues:

is the routine passed by name or by reference, and in either case how is it invoked?
is type checking performed for the passed routine's parameters, and if so how?
what is the reference environment for the passed routine if/when invoked?
The common choices upon which to base the environment are:
- The environment of the definition of the passed routine (deep binding)
  This is typically used by statically-scoped languages.
- The environment of the point where the passed routine is actually invoked (shallow binding).
  This is typcially used in dynamically-scoped languages such as SNOBOL.

C and C++ allow passing of pointers to functions, along with the function protocol to permit type checking, for example:

#include <iostream.h>

char foo(int x, double y)
{
   cout << x << " " << y << endl;
   return('!');
}

void callsfoo(char (*fptr)(int, double))
// takes as a parameter a pointer to a function,
//     which in turn takes an int and a double as parameters
{
   // call the passed function with values 3 and 4.5
   cout << (*fptr)(3,4.5) << endl;
}

void main()
{
   // pass, as a parameter, 
   // a pointer to the desired function foo
   callsfoo(foo);
}

In the example below we first use pass-by-reference to have a function assign a function-pointer to a parameter, and we use void* (together with some type casting in the caller) to have a function return a pointer to another function:

#include <iostream>
using namespace std;

// ===========================================================
// ==== The two toy functions whose pointers we'll pass around

// returns ascii for c
int char2int(char c)
{
   return c;
}

// if c is alphabetic returns value in range 1..26
// otherwise returns 0
int char2alphaint(char c)
{
    if (('a' <= c) && (c <= 'z')) return (c+1 - 'a');
    if (('A' <= c) && (c <= 'Z')) return (c+1 - 'A');
    return 0;
}

// ===========================================================

// function that picks either char2int or char2alphaint
//    based on the second parameter,
// then assigns that to the function pointer passed
//    (by reference) to the first parameter
void pickfunction(int (*&fptr)(char), int choice)
{
   if (choice == 1) fptr = char2int;
   else if (choice == 2) fptr = char2alphaint;
   else fptr = NULL;
}

// function that picks (as above) but returns a void pointer
//    to the selected other function
void *pickfunction(int choice)
{
   if (choice == 1) return (void*)(char2int);
   else if (choice == 2) return (void*)(char2alphaint);
   else return NULL;
}

int main()
{
   // declare a variable that can hold an appropriate function pointer
   int (*f)(char);

   // call the pickfunction routine, giving it our variable
   //    to store the function pointer in,
   // then call the chosen function
   pickfunction(f, 1); 
   cout << (*f)('B') << endl;

   // call the pickfunction routine,
   //    cast the result to an appropriate function pointer type,
   // then call the chosen function
   f = (int (*)(char))(pickfunction(1)); 
   cout << (*f)('B') << endl;
}

Local variables

Variables local to a subprogram are either static or stack dynamic.

Static local variables are shared across all invokations of the subprogram, while stack dynamic local variables are only accessible within the current invokation.

(Although they may be accessible as non-local variables if dynamic scoping is being used.)

The main disadvantage with stack dynamic locals is a loss in run time efficiency, due to two factors:

the stack space must be allocated and deallocated when the subprogram is invoked
indirect addressing (e.g. offset from top-of-stack pointer) must be used to access the variables during execution

A secondary disadvantage is that the locals cannot be used to communicate information across subprogram invokations.

However, in general stack dynamic locals are preferred over static locals because of the flexibility they provide: permitting nested and recursive function calls.

Polymorphic subprograms

Polymorphic subprograms can take parameters of different types on different activations.

This is typically supported through either overloaded subprograms or generic subprograms.

Overloaded subprograms:

In some languages (including Ada, Java, and C++) it is permissible to declare multiple routines with the same name as long as their protocols differ (i.e. they require different parameter/return types).

The correct subroutine body is identified based on the types of the passed parameters, and an appropriate call is invoked.

This is referred to as overloading (much as with operator overloading).

Note that readability generally suffers if radically different functionality is provided by the different functions associated with overloaded name.

Generic subprograms:

Generic subprograms are another method of specifying multiple versions of a program unit to handle parameters of different data types.

In C++ these are referred to as template functions, for example:

// declare a template function
template <class Type>
int generic_compare(Type element1, Type element2)
{
   if (element1 < element2)
      return(-1);
   if (element1 == element2)
      return(0);
   if (element1 > element2) 
      return(1);
}

// using the function with different types
int a, b;
char m, n;
double x, y;

result = generic_compare(a, b);
result = generic_compare(m, n);
result = generic_compare(x, y);

Coroutines

Often it is desirable to have two routines exchange control at key execution points, rather than having one routine call the other and wait for it to complete.

For instance we might wish execution to proceed as follows:

routine A executes a preprocessing stage, then transfers control to routine B
routine B executes a preprocessing stage then returns control to routine A
routine A carries out more processing, and transfers control to routine B again
routine B completes and returns control to routine A
routine A completes

In such an arrangement, the routines are referred to as coroutines.

Simula 67 and Modula 2 are two languages that support the coroutine concept.

Subprograms and stacks

There are a number of implementation details we will consider with respect to the use of subprograms in statically-scoped languages:

how is storage dynamically allocated for (non-static) local variables, and how are those variables accessed
how are parameter values and variable references communicated
how are the call and return operations implemented: specifically,
- how do we save all the relevent information for the calling routine before we suspend it,
- how do we restore that information once the called routine completes
- How do we grant access to non-local variables, particularly in languages with nested scopes (e.g. Pascal, with routines declared within other routines...)
- how do we identify the instruction to be carried out after the called routine completes

Quick review: von Neumann architectures

The underlying hardware can have a substantial impact on the efficiency of some of the programming language features we use.

Here we will consider some key issues with respect to von Neumann architectures, and their relevance in data manipulation and program control structures.

Core components:
The core components of a computer system in a von Neuman architecture are:

One or more communication buses, for transfering data between the components listed below
The CPU, or central processing unit, which includes:
- An ALU, or arithmetic and logic unit, which carries out all computations performed by the computer
- A control logic block, which contains the circuitry necessary to decipher individual machine code instructions and generate appropriate control signals for all the other components in the system
- A collection of one or more registers - small, high-speed storage devices to hold data being used by the ALU. These may include:
  - A program counter, recording the address of the next program instruction to be executed
  - An instruction register, recording the current (machine code) instruction
  - A status register, recording information about the current program priority, results of the last instruction, etc
  - Data registers, recording the values of variables or expressions currently being used/evaluated
  - Address registers, recording the memory address of instructions or data
One or more I/O devices, used for the system to communicate with the outside world (terminal, keyboard, mouse, light pen, microphones, printers, sensors, etc etc)
Memory to hold the active programs and data currently in use by the system
One or more secondary storage devices, used to permanently record data (disk drives, tape drives, CDROM, etc)

Program execution cycles:
The core steps in the execution of a program are as follows:

The program is loaded into memory:
- the machine code instructions are copied into a sequence of memory addresses
- space for all statically-allocated constants and variables is also allocated in memory
- the address of the first instruction is loaded into the program counter
The program executes one instruction at a time until completion, following the fetch/decode/execute cycle:
- The address of the next instruction is determined from the program counter
- The CPU broadcasts this value (via the bus) to memory, with appropriate control signals so the data is read/fetched from the correct memory address and returned (via the bus) to the CPU
  The CPU also updates the program counter so that it will point to the next instruction
- The CPU stores the fetched instruction in the instruction register
- The control logic effectively decodes the instruction
  (For large instructions it might be necessary to run through several of these fetch cycles)
- The control logic executes the decoded instruction, completing the cycle

Note that there is always a program running on the computer: typically the operating system software includes a program which runs the entire time the computer is active

It is responsible for things like

capturing user instructions to start/terminate other programs,
time-sharing between multiple active programs, and
a host of resource control and allocation activities

Common memory addressing modes:
All the executable instructions and data for a program are stored within the computer memory while the program executes

(Ignoring virtual memory and a variety of layers of memory cache for now, that can better be addressed in the operating systems course.)

The machine code instructions supported by the computer control logic typically allow only a limited number of different methods to access data in memory (aka addressing modes).

Since the data accesses and control structures of higher level languages must eventually be carried out by sequences of such machine code instructions, it is worthwhile briefly reviewing the common data access methods:

refering to a hard-coded data literal, or refering to a data item already stored in a register: in terms of execution these are the fastest accesses available
Generally the compiler determines which data items (variables) are allocated to the available registers, but some languages (such as C) allow the programmer to encourage the compiler to allocate specific variables to registers
refering to a data item by the memory address where it is stored: this is much slower than register references, since it requires the data to be fetched by memory
this is the common access method when reading/assigning static variable values that didn't get allocated to a register
referring to a data item by an offset from a memory address: even slower, this requires some computation in addition to the memory fetch
this is a common access method when dealing with records (or structs or classes etc) where fields are located as an offset from the starting address of the structure
it is also a common access method to stack-dynamic data, i.e. variables and parameters whose addresses are recorded as offsets from the top of the stack
referring to a data item whose address is stored in another memory address
the slowest method yet, here we must make one fetch to get the address for the second fetch (obtaining the desired data)
this commonly occurs when pointers (and heap dynamic variables) are used, assuming the pointer variable isn't currently copied in a register

Being aware of the implementation issues associated with the language constructs you use can make you a more effective programmer, but also be aware of the relative importance of readability and efficiency for your project.

On a related topic, it is also useful to be aware of the relative execution times required for different kinds of operation.

The table below might give ball park figures for the relative speeds of different kinds of operation (this is highly dependent on the hardware and operating system):

Simple arithmetic or logic operations: 1 (clock cycle) - this doesn't include the time to fetch any operands from memory
Fetch from memory (reads, writes): 3-8
Branching operations (loop tests, selection tests, gotos): 4-16
Floating point operations: 12+
Subroutine calls: 16+
this doesn't include the time to copy any parameters to/from the stack (see section below)

Subprogram calls using stacks

Here we'll consider a stack-based approach to maintaining the necessary information. The run-time stack is simply a stack, stored in a large block of memory, with associated operations to push data onto the stack, pop data off of the stack, and use pointers (or a similar mechanism) to access specific locations within the stack.

When source code from high level languages such as C, C++, Pascal, Ada, etc are compiled into machine code, the subroutine calls and returns are translated into sequences which include instructions to push data onto the stack and retrieve data from the stack.

Similarly, references to variables, constants, and parameters within the source code are translated into machine code sequences with appropriate accesses into the stack.

As we shall see, some of the actions which must be carried out by the machine code sequences are quite simple, while others are significantly more advanced.

We will start with the basic call and return features, then add parameter passing, return values, local variable allocation, and references to non-local variables.

Simple calls and returns
When the compiler translates a subroutine call, the resulting machine code will carry out a set of actions similar to:

Make a copy of the current pointer to the top of the stack
Get the address of the instruction which should be executed immediately after the called routine completes, and push this address (the return address) onto the stack
(NOTE: pushes and pops automatically adjust the top-of-stack pointer)
Push the copied (old) top-of-stack pointer onto the stack.
This will be used for cleaning up the stack once the subroutine completes, and for accessing ancestors when dealing with scoping issues.
In your text this is referred to as a dynamic link.
Get the address for the called routine and place it in the program counter, so that execution will proceed in the new routine (the compiler may record the routine's address as an offset from the start of the program, an offset from the current program counter value, an absolute memory location, or ...?)

So, with execution now begun in the called routine, we can logically view the stack as follows (assuming the stack grows "upward"):

|                             |
+-----------------------------+<--- Top of stack pointer
| old top-of-stack pointer    |
+-----------------------------+
| return address (old PC)     |
+-----------------------------+
|  run time stack contents    |
|    from already-active      |
|        routines             |

Eventually the called routine will complete, and the actions to be carried out at that point (again, compiled as a sequence of machine code instructions), include:

Copy the old top-of-stack pointer and pop it off the stack
Copy the return address and pop it off the stack
Put the values of the old top-of-stack pointer into the register tracking the top of the stack
Put the return address into the program counter

The stack, program counter, and top-of-stack pointer now look exactly as they should to continue with execution.

Adding parameter passing and return values
The actions above did not consider how to pass values between the calling and called routines, so we now add some additional steps to the process.

When the subroutine call is made the action sequence may look like:

get a copy of the current top-of-stack pointer
push the return address onto the stack
push the old top-of-stack pointer onto the stack
push space for the return value (if any) onto the stack
push each of the parameters onto the stack (pushing the parameters in right-to-left order or left-to-right order, depending on the language)
The value pushed onto the stack depends on the reference type: pass-by-value may be a copy of the actual data, pass-by-reference may be a memory address, etc.
(Where a pass-by-result parameter is used, a default (garbage) value may be pushed.)
put the address for the called routine into the program counter

If the function call was something like

x = foo(MyArray, MiddleInitial);

the stack might look something like:

|                             |
+-----------------------------+<--- Top of stack pointer
|  copy of MiddleInitial      |
+-----------------------------+
|                             |
|  copy of all the contents   |
|    of the array MyArray     |
|                             |
+-----------------------------+
| space for return value      |
+-----------------------------+
| old top-of-stack pointer    |
+-----------------------------+
| return address (old PC)     |
+-----------------------------+
|  run time stack contents    |
|    from already-active      |
|        routines             |

When the subroutine completes the same cleanup process is invoked, but now any necessary values must also be copied back:

For any pass-by-result parameters, pass-by-value-result parameters, and any explicit return values, those values must now be copied back to the calling routine.
Copy the old top-of-stack pointer
Copy the return address
Put the values of the old top-of-stack pointer into the top-of-stack pointer.
As far as our stack manipulation is concerned, this has the effect of popping all the parameters off at once
(the data is still in the same memory locations, but as far as the stack is concerned it's just garbage that will be overwritten with the next push operations)
Put the return address into the program counter

(Dynamic) local variables
This still doesn't allow for dynamic local variables, which must somehow be allocated with the information for the current function call.

To do so, we again add more steps when the subroutine is called:

get a copy of the current top-of-stack pointer
push the return address onto the stack
push the old top-of-stack pointer onto the stack
push space for the return value (if any) onto the stack
push each of the parameters onto the stack
push space for each of the (non-static) local variables onto the stack
if there are default values or initialization values then those can be pushed, otherwise the stack pointer can be adjusted to create the needed space
Note that if the stack pointer is adjusted to create space then the value of the uninitialized variable is whatever happened to be sitting in that memory location previously - hence the danger in using uninitialized variables.
put the address for the called routine into the program counter

If the called subroutine has local variables y, z then the stack after the call to foo(MyArray,MiddleInitial) might look like:

|                             |
+-----------------------------+<--- Top of stack pointer
|   space for variable z      |
+-----------------------------+
|   space for variable y      |
+-----------------------------+
|  copy of MiddleInitial      |
+-----------------------------+
|                             |
|  copy of all the contents   |
|    of the array MyArray     |
|                             |
+-----------------------------+
| space for return value      |
+-----------------------------+
| old top-of-stack pointer    |
+-----------------------------+
| return address (old PC)     |
+-----------------------------+
|  run time stack contents    |
|    from already-active      |
|        routines             |

Upon completion, the sequence looks the same as in our previous version:

Copy back and pass-by-result and pass-by-value-result parameters, and any explicit return values.
Copy the old top-of-stack pointer
Copy the return address
Put the values of the old top-of-stack pointer into the top-of-stack pointer
Put the return address into the program counter

Recursive calls
Note that the mechanism described above completely supports nested function calls and recursive function calls.

Suppose we have the following recursive factorial function:

int factorial(int N)                 // line 1
{                                    // line 2
   int result;                       // line 3
   if (N < 3) result = N;         // line 4
   else {                            // line 5
     result = Factorial(N-1);        // line 6
     result = N * result;            // line 7
   }                                 // line 8
   return(result);                   // line 9
}                                    // line 10

If we call factorial(5), which in turn calls factorial(4) which calls factorial(3), then the stack might look something like:

|                             |
+-----------------------------+<--- Top of stack pointer
|   variable result           |
+-----------------------------+
|  copy of value N == 3       |              Activation
+-----------------------------+                record
| return value (will be 3)    |                 for
+-----------------------------+              factorial(3)
| old top-of-stack pointer    |--+
+-----------------------------+  | points 
| return (address of line 8)  |  |   to
+-----------------------------+<-+  here ----------------
|   variable result           |
+-----------------------------+
|  copy of value N == 4       |              Activation
+-----------------------------+                record
| return value (will be 12)   |                 for
+-----------------------------+              factorial(4)
| old top-of-stack pointer    |--+
+-----------------------------+  | points 
| return (address of line 8)  |  |   to
+-----------------------------+<-+  here ----------------
|   variable result           |             
+-----------------------------+            
|  copy of value N == 5       |           
+-----------------------------+              Activation
| return value (will be 60)   |                record
+-----------------------------+                 for
| old top-of-stack pointer    |--+           factorial(5)
+-----------------------------+  | points 
| return (address of line 8)  |  |   to
+-----------------------------+<-+  here ----------------
|  run time stack contents    |
|    from already-active      |
|        routines             |

During each execution of the factorial routine, accesses to N and result are done via offsets from the current stack pointer, so even though the execution instruction sequence is similar the data values being accessed are different.

Referencing non-local variables
One last issue to address is how we access non-local variables.

This can take two forms: blocks within subroutines, and nested subroutine declarations (where access to non-local (but possibly non-global) variables is determined by the static program structure.

The problem in the latter case is that, while the structure is known statically, we need to ensure that the specific instance referred to is the correct one.

For instance, suppose we have a language (such as Pascal) that allows nested declarations of functions, and that function blah is declared within function foo.

Now, suppose foo calls itself recursively, and then the more recent call to foo calls blah.

Within blah we should have access to the (non-local) variables of foo, but specifically to the most recent call of foo.

This means that from a called subroutine we must be able to identify not only which routines are their static ancestors, but also to identify the most recent stack activation records for each of those ancestors.

Consider the following skeleton for a program with nested declarations:

program main
   procedure A-inside-main
       procedure B-inside-A
           // B statements
       end-B
       procedure C-inside-A
           procedure D-inside-C
               // D statements
           end-D
           // C statements, 
           // including calls to D
       end-C
       // A statements,
       // including calls to B and C
   end-A
   // main statements,
   // including calls to A
end-main

We will use an additional stack value with each subroutine to point to the static ancestor of the routine.

This might be pushed immediately after the copy of the old stack pointer.

Suppose procedure D is called from procedure C above, then the stack might look like:

|                             |
+-----------------------------+<--- Top of stack pointer
| space for D's locals        |
+-----------------------------+
| space for D's parameters    |
+-----------------------------+
| space for D's return value  |
+-----------------------------+ static link points to the most
| ptr to D's static ancestor  | recent activation for C, in this
+-----------------------------+ case would be same as old t-o-s ptr
| old top-of-stack pointer    |    
+-----------------------------+
| return address (old PC)     |
+-----------------------------+
|  run time stack contents    |
|    from already-active      |
|        routines             |

When checking for non-local references:

first follow the static link to the immediate static ancestor, and check their declared variables
if not found, then follow their static link to their immediate static ancestor, and check their declared variables
if not found, then follow their static link, etc etc.

In fact, the distance along the chain can be determined at compile time, which considerably simplifies the implementation

one need not actually search for matching identifiers at run time,
simply follow the desired number of static links
then access the value at the correct offset from the resulting pointer
(and the offset can also be determined at compile time)

Creating scopes for blocks

One way to create a new scope for a block is simply to treat it as a subroutine call with no parameters and no return values - though this adds much of the overhead of subroutine calls without making that apparent to the programmer.

An alternative is, for each subroutine, identify how much space can be required for block variables at any one time and allocate the block space similarly to the rest of the local variable space.

The compiler can then determine appropriate offsets for block variables, knowing that conflicts cannot arise between requests in separate blocks.

For example, consider the code fragment

int foo(int b) {
   int a;
   for (int x = 1; ...) {
       for (int y = 0; ...) {
       }
   }
   for (int z = 0; ...) {
   }
}

In this, we only need space for two block variables, since z is never active at the same time as x and y

As a result, the activation record might look something like:

|                             |
+-----------------------------+<--- Top of stack pointer
| space for block var y       |
+-----------------------------+
| space for block vars x, z   |
+-----------------------------+
| space for local var a       |
+-----------------------------+
| space for parameter b       |
+-----------------------------+
| space for return value      |
+-----------------------------+
| ptr to foo's static ancestor|
+-----------------------------+
| old top-of-stack pointer    |    
+-----------------------------+
| return address (old PC)     |
+-----------------------------+
|  run time stack contents    |
|    from already-active      |
|        routines             |

Dynamic scoping

In dynamic scoping we require a different form of access to non-local references.

Rather than following the static chain of links to search for the desired value, we can follow the chain of dynamic links (i.e. the "old" stack pointers)

Unfortunately, this technique requires that the activation records actually store the name of the corresponding local variables and a search at each level to find a match.

This results in substantially slower access times to non-local variables in dynamically-scoped languages.

An alternative implementation, that allows faster variable access, is as follows:

do not allocate local variables on the run time stack
instead, have a separate stack associated with each possible local variable
whenever a routine is invoked, space for the relevant local variables is created on the appropriate stacks
E.g. if foo has local variables x, y, then each time foo is called a new variable space is pushed onto the stack for x and another onto the stack for y
The most recent activation values are always at the top of the stack, and can be popped off when the current subroutine completes.

This gives faster access to non-local references, but adds considerable overhead at the start and end of subroutine calls as all the local variable stacks are adjusted.