This allowed improved readability and more refined type checking
If any of the attributes are dynamic - i.e. can change during program execution - then extra storage space must be associated with the variable to track the current value of the attribute
We also need to address how data types are supported in terms of operations on the data
These commonly include the numeric data types (integer, floating point, decimal), Boolean data types, and character data types.
Integers may be supported in a variety of sizes (short, long, double)
as well as signed and unsigned. The most common form is to have a fixed number of bits for the integer (hence there are bounds on the size of the integer value which can be represented) and a two's complement representation (although sign-magnitude is also occasionally used)
Two's complement uses a leading zero followed by the standard binary representation for positive integers, and a negative value is represented by taking the bitwise complement of the positive representation then adding one.
The primary advantages of two's complement over sign-magnitude representations are more efficient implementations of common operations in digital logic and avoiding the confusion of having both positive and negative representations of zero
The most common integer operations are almost always implemented in hardware for efficient operation
Floating point numbers typically a fixed number of bits are also assigned to represent floating point values (again, possibly in multiple sizes) thus limiting both the magnitude (or range) and precision of the floating point values which can be represented
Common representation formats include a sign bit (to indicate positive or negative), a set of bits to represent the exponent, and a set of bits to represent the fractional component.
E.g. for the floating point number 145.6, or 1.456 * 102, the fractional component represents the 1.456 and the exponent represents the 2 (a default base is assumed, in this case 10)
Depending on the system, floating point operations may be implemented in hardware or in software (having a dramatic effect on the efficiency of floating point operations)
Decimals are supported by many systems, especially those with a business orientation, also provide support for representing decimal values
Most existing systems achieve this by using a 4-bit field to represent a decimal digit. (This involves some wasted storage capacity, since only 10 of the 16 value combinations which can be manipulated in the 4-bit field are actually used.)
Some systems will support arithmetic operations directly on the decimal digits in hardware, other systems will provide the support through software (again, at some loss in efficiency)
Most modern programming languages directly support Boolean types, with the notable exception of C, which can simulate the use of Booleans by treating the numeric values 0 as logical false and 1 as logical true
(For backwards compatibility, C++ also allows the use of 0 and 1 as false and true.)
The typical implementation is to use a single byte (or the smallest addressable memory unit supported by the hardware) to store a Boolean value, although in fact only a single bit would be necessary.
The most common operations based on Booleans are assignment, comparison (equal/equal) and the logical operations (such as AND, OR, NOT, XOR)
Some common implementations include ASCII (which uses the binary representations of values 0..127 to represent 128 different characters), EBCDIC (0..255) and Unicode (which uses 16 bits, and hence can handle a character set with 65536 different characters - allowing for representation of alphabets from most of the world's natural languages)
While most of the common programming languages support ASCII, Unicode is supported by Java
The most common character operations are assignment and comparison, the latter of which may be purely equal/not-equal, or may allow for some relative ranking based on the underlying implementation
Typically a library of string manipulation functions or operations is also supported by a language
If a string has dynamic length, then storage needs to be allocated and deallocated during execution.
Some method is also required to identify where the end of the string currently resides - usually either by having an attribute which records the current length of the string or by having a special termination character which marks the end of the string.
Pascal, Fortran, Cobol, and Ada have fixed string lengths, Perl has unlimited dynamic string lengths, while C and C++ have dynamic string lengths but with a fixed upper bound on length.
An interesting issue is how much information needs to be maintained about the string during execution:
Typically this is either by some form of linkage (a new section of storage is allocated and linked to the end of the existing string) or by allocating a completely new section of storage that is sufficiently large to hold the entire string (then deallocating the old string space)
E.g. an ordinal type might be created for the seven days of the week, with the integer value 1 associated with Monday, 2 with Tuesday, 3 with Wednesday, etc.
(Some implementations will begin enumeration at 0, others at 1.)
This improves both the readability of the language, by allowing the use of domain-relevant literal constants, and the reliability of the language, by allowing more restrictive type checking
The days of the weeks example falls into this category - we need to list Monday, Tuesday, etc explicitly for the system to understand they are part of the set
An important issue is whether or not the same literal constant - e.g. the identifier Monday - is allowed to appear in two different enumerated types. (E.g. DaysofWeek and WorkDays.) If so, how should any given occurence of the literal be treated?
In Pascal, C, and C++ this is not allowed - the same literal constant cannot appear in more than one enumeration type definition in any given referencing environment.
In Ada it is allowed (overloading of literals) if rules for identifying the correct use of the literal constant are determinable from its context.
These are less universally applicable (obviously) but allow more compact definitions (also obviously)
Typically delimiters such as brackets are used to separate the subscript
from the array, e.g. myarray[3]
Readability is improved if the delimiting characters are different from those used to delimit parameter lists or program blocks.
(In Fortran, both parameter lists and array subscripts are delimited using
the paranthesis, making it difficult to determine if foo(3)
refers to the third element of array foo or a call to function foo with
the single parameter 3.)
In other languages the programmer must explicitly set both the upper and lower bounds on valid array subscripts.
C and C++ provide dynamic arrays in which the user is responsible for allocation and deallocation, while Perl's dynamic arrays are implicitly extended whenever the user makes subscript references beyond the current existing array bounds
myarray(3, 7, 9)
or
myarray[3][7][9]
Some languages limit the number of subscripts, and hence the dimensionality of the arrays (e.g. 3 dimensions in early Fortran versions)
It might also be used to take a subsequence of the elements of an array
For instance, attributes of an array might include its base memory
address and the (standard) size of individual elements. If subscripting
begins at 0 and array elements are stored contiguously in memory
then the location of the i'th element might then be computed as
start address + i * (size of individual element)
If several subscripts are given in a multidimensional array then this formula must be adjusted to consider the number of elements preceding the desired element. E.g.:
Suppose elements of a two-dimensional array are stored in row-major order (all of row 0, followed by all of row 1, etc) and the reference is for the element in row r and column c.
Then the address might be computed as:
start + (r * element-size * number-of-columns) + (c *
element-size)
Each element of an array is actually a key,value pair
Associative arrays are supported in Perl (refered to as hashes) and Java
Below we give some Perl examples:
// initializing an associative array %salaries = ("Cedric" => 75000, "Perry" => 57000, "Mary" => 55750, "Gary" => 47850); // assigning a new value $salaries{"Mary"} = 76000; // removing a value delete $salaries{"Gary"}; // looking for a value with a given key if (exists $salaries{"Perry"}) ....
For instance, an employee record might be composed of the employee's name, salary, employee number, etc.
Most modern programming languages support records in some way:
Cobol and Pascal support records, C (and C++) support the struct for record structures, C++ and Java support records through classes.
Record (or struct) declarations usually involve naming the overall type of record, then specifying names and data types for each of the record fields.
Referencing a record element then involves identifying both the record
and field of interest, e.g. employee.name = "Bob"
The data stored in the union is interpretted based on the data type the union is believed to currently represent.
Unions can substantially improve program flexibility, but their implementations often prevent a language from being strongly typed by failing to allow type checking (since any type checking on a union must take place dynamically)
Fortran, C, and C++ provide unions with no type checking.
Pascal's record variants stores a tag for the current type as part of unions within a record, but the user can change the tag without changing the variant, rendering type checking invalid.
Ada has an enforced version of tags for safe type checking.
Java does not support unions at all.
Most current languages do not explicitly support sets, one exception being Pascal.
(Ada does provide the set membership operator, which may be applied to enumerated types.)
nil
, which indicates the pointer currently does
not
reference any memory location
A pointer used for indirect addressing is a pointer variable, whereas variables which are dynamically allocated from the heap are refered to as heap-dynamic variables
Languages that support heap-dynamic variables must also provide an operator for allocating memory and returning its reference address
Other pointer problems include:
When the space is deallocated the tombstone is set to reflect this fact, and any dangling pointers really only reference the tombstone and can be detected during execution.
These pointers can point at almost any variable or function in memory, and in fact array names are treated like constant pointers to the first cell in the array
Pointers can be indexed as though they are array names
Generic pointers (of type void *) can point at values of any type, and they are often used as parameters for functions operating on memory.
Unless pass-by-reference is used, parameters in C++ are always pass-by-value
Java references actually refer to class instances, rather than memory addresses - making them a much safer (though less flexible) feature
One implementation (common to LISP approaches) is to treat the heap as a linked list of identical available cells - allocation and deallocation simply takes/returns lists of cells from/to the list.
The problem with this approach is identifying when a cell is no longer needed and can be returned to the heap (similar to the deallocation issues discussed in the Pascal pointer section)
Two approaches are taken towards deallocating space: