Programming languages: lecture notes

Expressions and assignments

Expressions are the means for specifying computation in a programming language.

The characteristics of those expressions are heavily based upon mathematical conventions for expression notation and evaluation.

>From a programming language design point of view, we need to address:

the common operators permitted in programming languages and the nature of the operands they work on.
how the order of evaluation is determined - must the programmer fully specify the order through the use of parentheses, or is there some implicit order of evaluation.
how is type checking performed, and what forms of type conversion take place implicitly or explicitly

For imperative languages the assignment statement plays a central role: the explicit setting of variable values during execution.

The assignment statement may be used as an expression itself, and is highly likely to assign values based upon expression evaluation.

Operands and operators

Generally, a programming language supplies operators that work on a single operand (unary operators) or two operands (binary operators).

The operator follows the form

<expression> ? <expression> : <expression>

The most common order in which operators and operands appear in programming languages is the same as the conventional mathematical notation: infix.

Under this notation, the order in which elements of an expression appear are:

unary operators: <operator> <operand>
binary operators: <operand> <operator> <operand>

As we will discuss in later sections, the problem with infix ordering is that it can lead to ambiguous orders of evaluation

The other two common orders are prefix and postfix

In prefix notation, the operator comes first, followed by the operand(s),

while in postfix notation, the operand(s) come first, followed by the operator

An advantage of a notation such as postfix is that the order of evaluation is precisely specified without the use of operator precedence levels and parentheses

As an example, suppose we wish to write (3 + 7) * (2 + 1) in postfix notation:

The multiplication operation takes place last, 
   working on the results of (3 + 7) and (2 + 1)
The expression (3 + 7) is written 3 7 +
The expression (2 + 1) is written 2 1 +
And the overall expression in postfix form is:
    3  7  +  2  1  +  *

Order of evaluation

Given an expression written in infix notation, we need a method of determining the order of evaluation.

E.g. should 6 / 3 * 2 evaluate to 4, or to 1?

One option is to simply require the programmer to fully parenthesize expressions, but most language designers regard this as needlessly restrictive, and a hindrance to readability.

Therefore, rules for order of evaluation are required to determine:

what is the order of precedence for operator types, e.g. should multiplication operations be carried out before addition operations?
given operators of equal precedence, what order should they be evaluated in, e.g. left-to-right or right-to-left?
if an operation (often a function call) has a side effect on an operand, what order does evaluation take place in, e.g. apply the side effect before or after the operation?
can short circuit evaluation be applied to speed up operation, e.g. in 0 * (y + x / 2) since we are multiplying by 0 the values of x and y are irrelevant, and (aside from possible side effects) evaluation need not take place

Operator precedence rules rank the different types of operator, e.g. given an otherwise ambiguous order, all multiplication and division operations are carried out before any addition or subtraction operations.

A language specification generally contains a complete ranking of the supported operators.

For example, the precedence levels for arithmetic operators in several languages are shown below

FORTRAN  Pascal       C             Ada
**       * / div mod  postfix++ --  ** abs
* /      + -          prefix++ --   * / mod
+ -                   unary+ -      unary+ -
                      * / %         binary+ -
                      binary+ -

Operator associativity Given operators of equal precedence order, the associativity rules determine their relative order of evaluation

Usually these are left-to-right or right-to-left

For example, in C operators are evaluated left-to-right except for unary +, unary -, prefix ++ and prefix --, which are evaluated right-to-left

Note that, because of the limitations of fixed-size representations of numeric values, the order of evaluation can have an impact on behaviour, even if mathematically it should not. E.g.:

Suppose we have represent integers using an 8-bit two's complement representation
This allows integers in the range -128 through +127
Suppose we have the expression 127 + 1 - 1
If we evaluate left-to-right, overflow occurs twice, giving (127 + 1) => -128, then (-128 - 1) => 127,
if we evaluate right-to-left, no overflow occurs, giving (1 - 1) => 0, then (127 + 0) => 127.
If we take the largest float we can represent, FLOAT_MAX in the climits library in C++, suppose we attempted the following
x = FLOAT_MAX * FLOAT_MAX / FLOAT_MAX;
Using right-to-left evaluation this will come out correctly, as FLOAT_MAX/FLOAT_MAX gives 1, then FLOAT_MAX*1 gives FLOAT_MAX.
Going the other direction, however, gives an overflow and loss of data on the multiplication.
(Try compiling and running the following C++ program to check this.)
```
#include <iostream>
#include <climits>

using namespace std;

int main()
{
   float x = MAXFLOAT * (MAXFLOAT / MAXFLOAT);
   cout << "y == " << MAXFLOAT << "" << endl;
   cout << "y * (y / y) == " << x << endl;
   x = (MAXFLOAT * MAXFLOAT) / MAXFLOAT;
   cout << "(y * y) / y == " << x << endl;
   return 0;
}
```

Operands and side effects Variables and constants in expressions are often evaluated by retrieving their values from memory

Supose an expression has a side effect, changing the value of a variable within an expression: should the side effect be applied before or after we evaluate the variable?

This is particularly important when function calls are permitted within expressions, where the function may have side effects on its parameters.

E.g. suppose in (foo(x) + blah(x)) the functions foo and blah have side effects, changing the value of x. Should foo change x before blah uses it, or vice versa, or should both use the original value of x?

Pascal and Ada leave the order of evaluation up to the implementor, thus making the effects difficult for the programmer to anticipate.

The definition for Java requires that evaluation appear to be carried out left-to-right - allowing side effects to change variable values but clearly specifying the order of effect.

Short circuit evaluation In some circumstances, we can identify the final value of an expression without evaluating the entire expression.

For example, in logic expressions (X AND Y), if we know X is false then the value of Y is irrelevant.

As another example, in the expression X * Y, if we know X is 0 then the value of Y is irrelevant.

Short circuit evaluation refers to the process of terminating evaluation as soon as we know what the expression must evaluate to.

While this speeds execution time, it can cause problems if the programmer had embedded functions with side effects into the "ignored" part of the expression.

On the other hand, if the programmer assumed short circuit evaluation was being used then they might include otherwise illegal statements in an expression, e.g.

if (x != 0) AND ((y / x) > 3) then ...

In this expression the programmer is assuming the condition will short circuit when x == 0, otherwise a divide-by-zero exception would occur when evaluating y/x

Overloading operators

Overloading operators means using the same operators to achieve different effects, with the type of effect determined by the data types of the operands.

For example, x+y might mean

scalar addition if x and y are integers,
pairwise addition if x and y are arrays of integers,
union if x and y are sets,
concatenation if x and y are strings, etc

Operator overloading is generally considered acceptable to an extent - however many "nonstandard" overloads can lead to readability problems

In languages which permit user defined overloading, the misuse of overloading can make a program very difficult to read and maintain

C++ permits user-defined overloading of almost all operators, whereas Java does not support such overloading.

Type conversions in expressions

Each operator has certain types of operands it expects, and produces results of an appropriate data type.

Under some circumstances, a value may be supplied in a type which is not an exact match for the intended type, but which may be reasonably converted into the correct type.

For instance, consider the following code fragment:

int   x = 3;
float a;
float b = 1.7;

a = b * x;

The floating point multiplication operator expects two floating point operands, and produces a floating point result.

However, one of the supplied operands is an integer.

Since the integers can be regarded as a subset of the reals, it seems reasonable to assume that the integer value, 3, can safely be converted to a real value, 3.0, for use in the expression.

This would be called a mixed-mode expression.

Type conversions are described as narrowing or widening, depending on whether they take convert an element from a smaller set to a larger set, or vice versa.

Widening conversions (such as the int-to-real case) are generally regarded as safe, whereas narrowing conversions (such as converting a real to an int) can frequently result in information loss and incorrect calculations.

Implicit conversions: coercion Coercion takes place when operands are automatically converted into values of the correct data type for an expression.

Different languages, and different implementations of some languages, support coercion to different degrees.

As stated in the earlier section, widening conversions are generally regarded as safe, and are supported in many languages.

In fact, due to precision limitations, it is possible to have information loss even in some widening conversions. (For example, a 32-bit integer has greater precision than a 32-bit floating point number.)

While implicit conversions improve the flexibility of a language, they limit the type-checking by automatically converting between operands which are apparently not of the correct type.

Explicit conversions: casting Some languages also allow the programmer to explicitly convert values of one data type into values of another data type.

This is referred to in C as casting.

The format of a C cast is to precede the data value with the new type in parentheses.

For example, suppose we want to perform floating point division on two integer values.

int x = 3; 
int y = 6;
float z;

z = x / y;
z = (float) x / (float) y;

The first case sets the value of z to 0.0, since integer division is carried out on 3/6 => 0, which is then coerced to a floating point value 0.0

The second case correctly carries out the division.

Type casting improves the flexibility of the language, but detracts from type checking and readability.

What do you think the following C++ code segment outputs?

char str[2] = " ";
z = (float) ((int) str);
cout << z << endl;

Relational expressions

A relational expression has an operator which is used to compare two operands, and return a Boolean result based on the comparison.

Common operators include less than, greater than, equal to, not equal to, etc.

The most common operands in relational expressions are numeric values or expressions, ordinal types, and strings. While the equal/not-equal comparisons are obvious in most cases, the other relational operators need to be defined for operands such as ordinals and strings.

Because relational expressions often have expressions as operands, the relational operators must be allocated a precedence level: typically one lower than the precedence levels of the numeric operators.

For example, in the expression x / 3 > 2 + y, a typical order of evaluation would be (x / 3) > (2 + y).

Boolean expressions

Boolean expressions consist of Boolean variables, Boolean constants, relational expressions, and Boolean operators.

Again, the Boolean operators must fit appropriately into the precedence table for operators, and must have associativity rules.

In the case of C and C++, numeric values are treated as logical false if they have value 0, and logical true otherwise. Thus any numeric expression is valid within a Boolean expression.

Conversely, any Boolean expression which logically evaluates to true is treated as numeric value 1, while any which logically evaluates to false is treated as numeric value 0, and thus Boolean expressions are valid within numeric expressions.

When does the following C++ function return 1, and when does it return 0?

int compare(int x, int y, int z)
{
   return(x > y > z);
}

The relational operators are evaluated left-to-right, so this is equivalent to ((x > y) > z).
The expression (x > y) evaluates to 1 if x > y, 0 otherwise
Thus we now have either (0 > z) or (1 > z)

The expression returns 1 if z is negative, and returns 1 if z is non-positive and x > y. In any other case it returns 0.

This is quite different than an intuitive evaluation, which might assume it was checking whether or not y is "between" x and z.

Assignment statements

Assignment statements are an integral part of imperative languages, in which variable values are explicitly set and altered during the course of program execution.

The most common format is

<variable> <assignment operator> <expression>

Because of this, we often talk of the left hand side, LHS, and right hand side, RHS, of assignment statements as the variable and the expression respectively.

There are several variants on the basic concept of assignment, discussed briefly below.

Simple assignments have a single target variable on the LHS, whereas multiple target assignments have two or more target variables, each of which is assigned the same value.

Compound assignment refers to a shorthand method used to also include the target variable as an operand in the expression.

It is common to have arithmetic expressions of the form

<target> <assignment> <target> <operator> <expression>

The compound assignment operators merely permit the programmer to avoid typing the target variable name twice, e.g.:

x += y;  // equivalent to x = x + y
x *= y;  // equivalent to x = x * y
etc

Compound operators are used in C, C++, and Java.

Unary assignment refers to a shorthand method to perform increment and decrement operations on a target variable, e.g.:

x++;  // equivalent to x = x + 1
++x;  // equivalent to x = x + 1
x--;  // equivalent to x = x - 1
--x;  // equivalent to x = x - 1

Note that the difference between the prefix and postfix versions is the order of evaluation.

sum = x++;  // equivalent to sum = x; then x = x + 1;
sum = ++x;  // equivalent to x = x + 1l then sum = x;

Note that since the order of evaluation for unary operators is right-to-left in C++, the expression -count++ is equivalent to -(count++)

Assignments as expressions In some languages an assignment can actually be used as an expression, for example x = y = z; may be a valid statement.

As an expression, the value of the assignment statement is the value assigned (right-to-left).

In the case above, the value z is assigned to y, and the value of y = z is thus z. That value is in turn assigned to x.

One problem this leads to in C and C++ is that the compiler cannot warn the programmer about mistaken if statements such as the following:

Suppose a programmer wants to check if x has the value 0, and run an error routine if it does.

However, the programmer leaves out an = sign while writing the code:

if (x = 0) printerr();