The characteristics of those expressions are heavily based upon mathematical conventions for expression notation and evaluation.
>From a programming language design point of view, we need to address:
For imperative languages the assignment statement plays a central role: the explicit setting of variable values during execution.
The assignment statement may be used as an expression itself, and is highly likely to assign values based upon expression evaluation.
The operator follows the form
<expression> ? <expression> : <expression>If the first expression evaluates to non-zero then the overall result is determined evaluating the second expression, otherwise the overall result is determined by evaluating the third expression.
The most common order in which operators and operands appear in programming languages is the same as the conventional mathematical notation: infix.
Under this notation, the order in which elements of an expression appear are:
As we will discuss in later sections, the problem with infix ordering is that it can lead to ambiguous orders of evaluation
The other two common orders are prefix and postfix
In prefix notation, the operator comes first, followed by the operand(s),
while in postfix notation, the operand(s) come first, followed by the operator
An advantage of a notation such as postfix is that the order of evaluation is precisely specified without the use of operator precedence levels and parentheses
As an example, suppose we wish to write (3 + 7) * (2 + 1) in postfix notation:
The multiplication operation takes place last, working on the results of (3 + 7) and (2 + 1) The expression (3 + 7) is written 3 7 + The expression (2 + 1) is written 2 1 + And the overall expression in postfix form is: 3 7 + 2 1 + *
E.g. should 6 / 3 * 2 evaluate to 4, or to 1?
One option is to simply require the programmer to fully parenthesize expressions, but most language designers regard this as needlessly restrictive, and a hindrance to readability.
Therefore, rules for order of evaluation are required to determine:
Operator precedence rules rank the different types of operator, e.g. given an otherwise ambiguous order, all multiplication and division operations are carried out before any addition or subtraction operations.
A language specification generally contains a complete ranking of the supported operators.
For example, the precedence levels for arithmetic operators in several languages are shown below
FORTRAN Pascal C Ada ** * / div mod postfix++ -- ** abs * / + - prefix++ -- * / mod + - unary+ - unary+ - * / % binary+ - binary+ -
Operator associativity Given operators of equal precedence order, the associativity rules determine their relative order of evaluation
Usually these are left-to-right or right-to-left
For example, in C operators are evaluated left-to-right except for unary +, unary -, prefix ++ and prefix --, which are evaluated right-to-left
Note that, because of the limitations of fixed-size representations of numeric values, the order of evaluation can have an impact on behaviour, even if mathematically it should not. E.g.:
#include <iostream> #include <climits> using namespace std; int main() { float x = MAXFLOAT * (MAXFLOAT / MAXFLOAT); cout << "y == " << MAXFLOAT << "" << endl; cout << "y * (y / y) == " << x << endl; x = (MAXFLOAT * MAXFLOAT) / MAXFLOAT; cout << "(y * y) / y == " << x << endl; return 0; }
Operands and side effects Variables and constants in expressions are often evaluated by retrieving their values from memory
Supose an expression has a side effect, changing the value of a variable within an expression: should the side effect be applied before or after we evaluate the variable?
This is particularly important when function calls are permitted within expressions, where the function may have side effects on its parameters.
E.g. suppose in (foo(x) + blah(x)) the functions foo and blah have side effects, changing the value of x. Should foo change x before blah uses it, or vice versa, or should both use the original value of x?
Pascal and Ada leave the order of evaluation up to the implementor, thus making the effects difficult for the programmer to anticipate.
The definition for Java requires that evaluation appear to be carried out left-to-right - allowing side effects to change variable values but clearly specifying the order of effect.
Short circuit evaluation In some circumstances, we can identify the final value of an expression without evaluating the entire expression.
For example, in logic expressions (X AND Y), if we know X is false then the value of Y is irrelevant.
As another example, in the expression X * Y, if we know X is 0 then the value of Y is irrelevant.
Short circuit evaluation refers to the process of terminating evaluation as soon as we know what the expression must evaluate to.
While this speeds execution time, it can cause problems if the programmer had embedded functions with side effects into the "ignored" part of the expression.
On the other hand, if the programmer assumed short circuit evaluation was being used then they might include otherwise illegal statements in an expression, e.g.
In this expression the programmer is assuming the condition will short circuit when x == 0, otherwise a divide-by-zero exception would occur when evaluating y/x
For example, x+y might mean
Operator overloading is generally considered acceptable to an extent - however many "nonstandard" overloads can lead to readability problems
In languages which permit user defined overloading, the misuse of overloading can make a program very difficult to read and maintain
C++ permits user-defined overloading of almost all operators, whereas Java does not support such overloading.
Under some circumstances, a value may be supplied in a type which is not an exact match for the intended type, but which may be reasonably converted into the correct type.
For instance, consider the following code fragment:
int x = 3; float a; float b = 1.7; a = b * x;The floating point multiplication operator expects two floating point operands, and produces a floating point result.
However, one of the supplied operands is an integer.
Since the integers can be regarded as a subset of the reals, it seems reasonable to assume that the integer value, 3, can safely be converted to a real value, 3.0, for use in the expression.
This would be called a mixed-mode expression.
Type conversions are described as narrowing or widening, depending on whether they take convert an element from a smaller set to a larger set, or vice versa.
Widening conversions (such as the int-to-real case) are generally regarded as safe, whereas narrowing conversions (such as converting a real to an int) can frequently result in information loss and incorrect calculations.
Implicit conversions: coercion Coercion takes place when operands are automatically converted into values of the correct data type for an expression.
Different languages, and different implementations of some languages, support coercion to different degrees.
As stated in the earlier section, widening conversions are generally regarded as safe, and are supported in many languages.
In fact, due to precision limitations, it is possible to have information loss even in some widening conversions. (For example, a 32-bit integer has greater precision than a 32-bit floating point number.)
While implicit conversions improve the flexibility of a language, they limit the type-checking by automatically converting between operands which are apparently not of the correct type.
Explicit conversions: casting Some languages also allow the programmer to explicitly convert values of one data type into values of another data type.
This is referred to in C as casting.
The format of a C cast is to precede the data value with the new type in parentheses.
For example, suppose we want to perform floating point division on two integer values.
int x = 3; int y = 6; float z; z = x / y; z = (float) x / (float) y;The first case sets the value of z to 0.0, since integer division is carried out on 3/6 => 0, which is then coerced to a floating point value 0.0
The second case correctly carries out the division.
Type casting improves the flexibility of the language, but detracts from type checking and readability.
What do you think the following C++ code segment outputs?
char str[2] = " "; z = (float) ((int) str); cout << z << endl;
Common operators include less than, greater than, equal to, not equal to, etc.
The most common operands in relational expressions are numeric values or expressions, ordinal types, and strings. While the equal/not-equal comparisons are obvious in most cases, the other relational operators need to be defined for operands such as ordinals and strings.
Because relational expressions often have expressions as operands, the relational operators must be allocated a precedence level: typically one lower than the precedence levels of the numeric operators.
For example, in the expression x / 3 > 2 + y, a typical order of evaluation would be (x / 3) > (2 + y).
Again, the Boolean operators must fit appropriately into the precedence table for operators, and must have associativity rules.
In the case of C and C++, numeric values are treated as logical false if they have value 0, and logical true otherwise. Thus any numeric expression is valid within a Boolean expression.
Conversely, any Boolean expression which logically evaluates to true is treated as numeric value 1, while any which logically evaluates to false is treated as numeric value 0, and thus Boolean expressions are valid within numeric expressions.
When does the following C++ function return 1, and when does it return 0?
int compare(int x, int y, int z) { return(x > y > z); }
The expression returns 1 if z is negative, and returns 1 if z is non-positive and x > y. In any other case it returns 0.
This is quite different than an intuitive evaluation, which might assume it was checking whether or not y is "between" x and z.
The most common format is
<variable> <assignment operator> <expression>Because of this, we often talk of the left hand side, LHS, and right hand side, RHS, of assignment statements as the variable and the expression respectively.
There are several variants on the basic concept of assignment, discussed briefly below.
Simple assignments have a single target variable on the LHS, whereas multiple target assignments have two or more target variables, each of which is assigned the same value.
Compound assignment refers to a shorthand method used to also include the target variable as an operand in the expression.
It is common to have arithmetic expressions of the form
<target> <assignment> <target> <operator> <expression>The compound assignment operators merely permit the programmer to avoid typing the target variable name twice, e.g.:
x += y; // equivalent to x = x + y x *= y; // equivalent to x = x * y etcCompound operators are used in C, C++, and Java.
Unary assignment refers to a shorthand method to perform increment and decrement operations on a target variable, e.g.:
x++; // equivalent to x = x + 1 ++x; // equivalent to x = x + 1 x--; // equivalent to x = x - 1 --x; // equivalent to x = x - 1Note that the difference between the prefix and postfix versions is the order of evaluation.
sum = x++; // equivalent to sum = x; then x = x + 1; sum = ++x; // equivalent to x = x + 1l then sum = x;Note that since the order of evaluation for unary operators is right-to-left in C++, the expression -count++ is equivalent to -(count++)
Assignments as expressions
In some languages an assignment can actually be used
as an expression, for example x = y = z;
may be a valid statement.
As an expression, the value of the assignment statement is the value assigned (right-to-left).
In the case above, the value z is assigned to y, and the value of y = z is thus z. That value is in turn assigned to x.
One problem this leads to in C and C++ is that the compiler
cannot warn the programmer about mistaken if
statements such as the following:
Suppose a programmer wants to check if x has the value 0, and run an error routine if it does.
However, the programmer leaves out an = sign while writing the code:
if (x = 0) printerr();Regardless of the value of x, this assigns x the value 0 and does not run the printerr() function: virtually the opposite of the intended behavior!