Programming languages: lecture notes

Statement-level control

To control computation in imperative languages, we need some method for specifying repetition of certain commands or command sequences, and some method for selecting between alternative command sequences.

Control statements give us these capabilities, and we will consider control statements in four categories:

command sequences, or compound statements
controls for choosing between alternative computation sequences, or selection statements
controls for specifying repetitive computation, or iterative statements
controls for transfering control to specified points in a computation sequence, or unconditional branching

Compound statements

A compound statement is simply a collection of statements, executed sequentially from beginning to end.

The compound statement can logically be thought of as a single statement with more sophisticated results.

Many languages provide delimiters for specifying the scope of a compound statement (such as begin/end pairs or brackets), while others implicitly build the compound statements into their other control statements.

Two issues of design interest with compound statements are:

is there a unique first statement in the compound statement, or is it possible to start at any one of a number of different points (e.g. does the statement contain labels that may be "jumped to")
is there a unique last statement in the compound statement, or is it possible to exit the statement "early" (e.g. in C++ the break and return statements are often used to exit a block early)

There is widespread acceptance for the use of multiple-exit points, but there are many concerns (primarily w.r.t. readability and reliability) with the use of multiple entry points.

Selection statements

Selection statements allow controlled choice between multiple possible execution paths.

The simplest selections allow choice between two paths (two-way selection) or the option to include/not include a particular sequence (single-way selection), while more general forms allow choice between any number of paths the programmer is willing to specify (multiple selection).

The language design decisions faced include:

What form does the controlling expression take for the selection?
Do the selectable control sequences involve single or multiple statements?
How is nested selection supported?
For multiple selection, how is the case of an unrepresented selection handled?

Single-way selection

The simplest case, single-way selection, essentially gives the programmer to either execute or bypass a sequence of one or more statements.

This is typically applied by evaluating a Boolean sequence and, based on the result, choosing to execute the statements or branch around them.

For example:

...
if (x != 0) {
   cout << (y / x ) << endl;
}
...

Two-way selection

A natural expansion on the single-way selection is to allow execution of one set of statements if the Boolean expression evaluates to true, and an alternate set of statements if the expression evaluates to false.

if (x != 0) {
   cout << (y / x) << endl;
} else {
   cout << "Division by zero error" << endl;
}

Nesting two-way selections

Understanding the meaning of nested two way selections can be more complex, e.g.

if (sum == 0) then
if (count == 0) then
result := 0
else result := 1

The language must clearly define whether the "else" statement refers to the first or last if statement.

Ideally, the programmer would also provide suitable layout or statement demarcation to make this clear, but if this style of code is permitted by the language syntax then the programmer may well misunderstand how the code segment will be interpretted.

Common solutions lie in altering the syntax to force clarity, either through the use of enclosing brackets, begin/end pairs, or even the addition of keywords such as endif, or elseif.

Multiple selection

A more general form of selection allows the programmer to provide an expression and a set of possible statement-sequences, one of which is chosen based on the result of the expression.

Note that the same effects could be achieved through collections of two-way (or even single-way) selections, but with a probable loss in readability.

The most common form of multiple selection is to provide an expression which evaluates to an ordinal type (integer, character, enumerated type, etc) and then provide a set of anticipated results and associate a statement list or compound statement with each listed result. E.g. the C++ switch statement:

	
// given some integer variable x
switch (x) {
  case 1:  // if x == 1 execute this block
           ...
           break;
  case 7:  // if x == 7 execute this block
           ...
           break;
  default: // if no other cases match, execute this block
           breakl
}

Note the inclusion of the "default" case - such a mechanism is optional in some languages (e.g. C, C++), required by some, and unsupported in others (e.g. Pascal).

The C, C++ switch statement is also somewhat unusual in that the switch expression is used to determine an entry point into the execution statements, but the programmer must explicitly include break statements to identify the exit point (otherwise control flows from the last statement of the current case to the first statement of the next case, rather than exiting the switch).

A more general construct would be to allow the use of non-ordinal expressions, but in most languages this must be simulated with else-if sequences, e.g.

if (x < y) {
   ...
} else if (x > z) {
   ...
} else if (x == 0) {
   ...
} else {
   ...
}

Note that ordering of clauses becomes significant in such a structure, since there may be more than one satisfied condition but the only one which is executed is the "first" satisfied.

Iterative statements

Iterative statements essentially encounter the looping structures of common programming languages, whether they be repeats, whiles, do-whiles, for loops, or ???

The iterative statements are generally divided into the two categories of counter-controlled loops and logically-controlled loops.

For either form of loop there is also the design concern as to where the control mechanism appears in the loop, typically either bottom-testing (pretest) or top-testing (posttest).

Ideally the loop structures mirror common architectural features for testing and branching (or vice versa), but this is far from general.

Counter-controlled loops

Counter controlled loops use the following features to control iteration:

a loop variable, to maintain the count
a stepsize, to determine the amount by which the loop variable is altered on each pass through the loop
an initial value for the loop variable
a terminal value for the loop variable

Design issues include:

what data types can the loop variables have? (e.g. is there a good reason and efficient implementations to support floating point control variables, character variables, etc?)
are there special conditions surrounding the loop variable scope?
For example, in C++ for loops, the loop variable can be declared within the for initialization step, and has scope limited to the loop itself
what value does the loop variable have after termination of the loop?
Consider the C loop
```
for (x = 1; x < 13; x++) {
   ....
}
```
After the loops completes, is the value of x 12, 13, or 14?
is the user allowed to explicitly alter the loop variable value within the loop body?
Allowing this increases flexibility, but also detracts from readability.
Consider the C loop
```
for (x = 1; x < y; x+=2) {
    ...
    if (x > y) x--;
    ...
} 
```
if an expression is used to define loop parameters (e.g. the terminal value), should the expression be re-evaluated during each iteration of the loop?
This must be clearly specified, and the programmer who puts expressions in a control statement must ensure they understand the mechanism which will be used:
```
y = 13;
for (x = 1; x < y; x++) {
    ...
    y = foo(x, z);
    ...
}
```
Should the loop terminate when x >= 13, or should it terminate when x >= foo(x, z) ?

The C++ for loop has the following syntax

for (<expression1>; <expression2; <expression3)
    <loop body>

Evaluation of the three expressions is as follows:

expression1 is evaluated once, at the start of for loop execution
expression2 is evaluated on each pass, prior to execution of the loop body
expression3 is evaluated on each pass, following execution of the loop body

Each of the expressions may be omitted, and if omitted defaults to logical true, thus

omitting the first expression causes no initialization to take place,
omitting the second expression causes an infinite loop,
omitting the third expression causes the loop counter to not be updated (unless explicitly done by the user in the loop body)

Logic-controlled loops

Here the repetition control is based on a Boolean expression, rather than the iterative features discussed in the preceding section.

Design of such loops is much simpler than with iterative looping, because initial values and all alterations are left to the programmer rather than supplied as a loop feature.

The major decision is whether the loop should be pretest or posttest (i.e. check for termination prior to each execution of the loop body or following each execution).

In C++ the pretest version is the "while" loop, and the posttest version is the "do-while" loop.

A more flexible option is provided in Ada through the "exit when" option:

the loop itself is by default an infinite loop
the programmer can place the exit test anywhere within the loop, along with the Boolean exit condition
in fact, this feature even allows the programmer to specify how far to exit when dealing with nested loops (by providing a label identifying each loop, and specifying the label as part of the exit instruction)

A mixed-control loop

One interesting loop structure is the for loop of Algol 60, in which the programmer can either specify an iterative control or a logic control

The following are valid algol 60 for loops running through loop variable values 1 through 10:

for count := 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 do
    list[count] := 0

for count := 1 step 1 until 10 do
    list[count] := 0

for count := 1, count + 1 while (count <= 10) do
    list[count] := 0

(loop variables can be either integers or real numbers)

The forms can even be combined, to create effective though difficult to read constructs such as

for index := 1, 4, 13, 41 step 2 until 47, 
             3 * index while index < 1000, 34, 2, -24 do
    sum := sum + index

Here, everything between := and do is a list element to use in iterating index, i.e.

1, 4, 13, 41, 43, 45, 47, 3*49==147, 3*147==441, 34, -2, 24

The value of sum is the original value of sum plus all these.

Iteration on data elements

In each case so far, looping has been based on a set of control conditions.

Another possibility is to have a loop address each of a set of data elements.

This is supported in Perl with the "foreach" loop, which operates once on each of the elements of a list or array.

Unconditional branching

An unconditional branch, or goto, transfers control to a specified point in a program (e.g. a label).

This is the most flexible single command for control execution, but also has the greatest potential for inappropriate use.

The major problem with the goto statement is that its use can easily obscure program structure, making software difficult to understand and maintain.

Most popular languages support the goto statement (a notable exception being Java) but its use is widely discouraged.