Special words, the words used to name actions or control forms within a language, may be keywords, reserved words, or predefined words:
REAL
can be used as a (user-defined)
variable name, but if it appears at the beginning of a statement
and is followed by another identifier then it is considered to
represent the a type within a declaration.
REAL MYVAR
is an example of a declaration,
whereas REAL = 3.4
is an example of using REAL
as a variable name.
These reduce the flexibility a user has in naming, but improve readability since the word has a single clear meaning.
User-defined names: may be applied to:
In addition to the use to which a user-defined name is put, different languages give users different degrees of freedom in determining valid names (or identifiers).
Among the choices a language designer must make are:
SQUAREROOT
refer to the same entity as
SquareRoot
and squareroot
?
Case sensitive names allow for greater flexibility, but can cause confusion (imagine having several versions of many functions, with names distinguished only by various capitalization characteristics).
The choice of alphabet may have a dramatic effect on the readability of the language and the ease with which a language is compiled.
(E.g. if brackets are used to distinguish blocks of code, can they also appear as characters within an identifier, and if so how does the compiler identify which use is valid for any particular bracket in the source code?)
Later in the semester we will consider the use of identifiers with labels and subroutines, first we will considerer variables in some detail.
Note that the variable address and value are sometimes referred to as its l-value and r-value, respectively.
Binding of attributes to variables: depending on the language, the attributes mentioned might exist for the entire life of the variable, or might change over time.
To clarify when the attributes of a variable take effect, we use the concept of binding.
A binding is static if the variable attribute is fixed before run time, and is unchanged throughout program execution.
A binding is dynamic, on the other hand, if the attribute can change at some point during execution.
Consider the different variable attributes with respect to binding:
This is (generally) highly hardware dependent, and is an aspect we will not focus closely on.
Bindings may be explicitly declared by the user, or may be implicitly declared through the rules and conventions of the language, applied to the usage of the variable in the program itself.
Consider the following C++ code segment:
#include <iostream.h> int mysquare(int x); int y; void main() { cout << "Please enter an integer" << endl; cin >> y; cout << "The square of " << y << " is "; y = mysquare(y); cout << y << endl; } int mysquare(int x) { int result; result = x * x; return(result); }Variable x has
int
(defined before run time)
mysquare
While variable y has
int
(defined before run time)
Most programming languages require explicit declaration of variables - supplying at least the name, usually the type, and occasionally an initial value for the variable.
Some languages, such as PERL, FORTRAN, and BASIC, allow implicit declarations: when a variable is first used it is automatically or implicitly declared, and language rules are applied to attempt to derive the other attributes (value, type, etc).
In FORTRAN the variable is an integer if the identifier begins with I..N, and is a real otherwise.)
Explicit declarations guarantee the compiler has complete information with which to apply type and error checking, but place extra restrictions on the programmer.
Languages which use dynamic type binding do not assign a type to a variable until a value is assigned to the variable: the type that is bound is one appropriate to the value assigned.
In some cases, the type can also be dynamically changed - e.g. you assign a variable an integer value at one point, and a string value at some later point.
Dynamic typing makes a language much more flexible, but has several disadvantages:
In C++ variables are statically typed, however some implicit type conversion takes place at run time when the type of an evaluated value (e.g. the right hand side of an assignment statement) does not match the expected type (i.e. the left hand side the of statement). This causes some of the same complications as dynamic typing.
Variable lifetimes: the lifetime of a variable is typically referred to as the period during which it has storage space allocated to it.
We will consider three classifications of variables, based on the way in which storage locations are bound to the variables:
In C++ functions, a local variable preceded by the word static
is a static variable, and uses the same storage location throughout program
execution.
Such a variable could be used to track information useful from one call of
a subroutine to another. For example, in the function below the static
variable
invocations
tracks how many times the function has been executed.
void DoSomeStuff(int data) { static int invocations = 0; int x, y, z; // do whatever the function is supposed to do invocations++; }
In C++ functions, local variables not preceded by the
static
keyword are stack-dynamic: specific storage is allocated on each call of the
function,
and deallocated when the function completes
In the example above, x, y, z
are stack-dynamic variables,
and their values are lost between invocations of DoSomeStuff
.
In C++, the new
operator, when applied to a type name, calls for
the
allocation of memory space for the appropriate data type.
Type Checking: is the process of ensuring that the operands of an operator are of the correct type.
A compatible type is one that is legal for the operator, or one which may be implicity converted (or coerced) into a legal type.
Types which are not compatible provoke type errors.
Type checking is most efficiently carried out prior to execution, but is not possible when dynamic type binding is allowed, or in cases (such as C++ unions) where the same memory location is permitted to store values of different data types at different times during execution.
A programming language is strongly typed if type errors are always detected.
Fortran, C, and C++ all use implicit and explicit coercion of data types frequently, and as such are not strongly typed languages.
In C++, for instance, the statements below result in an implicit conversion of the integer value from x into a floating point value for y:
int x = 1; float y = 3.0; y = x;(Coercion will be addressed again in a couple of weeks.)
We consider two types of type compatibility: name type compatibility and structure type compatibility.
This means that only the variables' type names need be compared to determine if they are compatible: fast and easy, but not as flexible as structure type compatibility.
C++ uses name type compatibility.
This is more flexible than name type compatibility, but requires more checking and prevents us from distinguishing between two data types if they do happen to have the same underlying structure.
C uses structure type compatibility.
In fact, object-oriented languages also face the issue of object compatibility, but this will be addressed later in the semester.
Variable scopes: the scope of a variable is the range of program statements in which the variable is "visible".
For example, the scope of a variable declared within a C++ function is local to that function - it cannot be referenced from outside the function.
The local variables of a program unit or block are the variables which are visible within the block and which are also declared within that block.
The nonlocal variables of a program unit or block are the variables which are visible within the block but which are not declared within it.
Static scoping means the scopes of variables are identified prior to run time, whereas dynamic scoping means variable scopes are identified during execution.
In languages like Pascal, subprograms can be nested, creating a heirarchy of scopes.
If a language uses static scoping, then it is possible prior to execution time (e.g. by the compiler) to determine which variable is referenced by the use of an identifier at any point in the code.
In the case of C++, if an identifier matches a local variable (or parameter) then it is assumed that variable is the one being referenced, otherwise the match is to any global variable using the identifier. (Though you can force a match to the global variable, bypassing the local, by preceding the identifier name with "::".)
For example:
int Result; // global variable int AddThree(int y) { int result; // local variable result = y + 3; // assign y+3 to local variable // now add contents of global variable to local variable result = result ::result; return(result); }In addition to allowing variables to be local to a subroutine, some languages allow variables to local to a block.
In C++, for example, a variable can be declared within a compound statement such as a for loop, while loop, or if statement:
// assorted code while (x < 3) { int i; // i is visible only in the while loop // assorted code } // and more assorted code(Actually, you'll find some non-standard C++ compilers fail to support these scoping rules.)
Dynamic scoping is supported in some versions of APL, LISP, and SNOBOL.
In dynamic scoping, the scope rules are based on the calling sequence of subroutines, not on the way in which they are "structurally" nested.
If an identifier does not match any variables in the current function, we search the function which called it to find any matching variables, then the function which called that, etc until a match is found.
Thus, an identifier in a particular function statement can refer to completely different variables in different executions of the same program!
Obviously this can raise significant concerns with respect to readability and reliability, and can require significantly more run time checking.
However, they can eliminate a great deal of parameter passing, since the relevant variables (and hence values) are implicitly visible to the called routine!