Supplementary notes on textbook figures and algorithms

Quick links
  • Chapter 1
  • Chapter 2
  • Chapter 3
  • Chapter 4
  • Chapter 8
  • Chapter 9
  • Chapter 10
  • Chapter 11
  • Chapter 12
  • Chapter 13
  • Chapter 14
  • Chapter 15
  • Chapter 16
  • Chapter 17
  • Chapter 18

  • Chapter 1
    Figure 1.2.1
    The three seperate subsystems are: CPU, Memory, and I/O
    There are three communication channels (buses) that connect to all three subsystems: the Data Bus, the Address Bus, and the Control Bus


    Chapter 2

    Algorithm 2.2.1
    result = 0
    for ( i = 0; i < n; i++) result = result + di*2^i

    Algorithm 2.3.1 re-written as a do-while
    quotient = N
    i = 0
    do {
       di = quotient % 2
       quotient = quotient / 2
       i++
    } while (quotient != 0)
    


    Chapter 3

    Algorithm 3.1.1
    carry0 = 0
    for (i = 0; i < N; i++) {
       sumi = (xi + yi + carryi) % 10
       carryi+1 = (xi + yi + carryi) / 10
    }

    Algorithm 3.1.2
    borrow = 0
    for (i = 0; i < N; i++) {
        if (yi <= x i) {
           differencei = xi - yi
        } else {
           // find the column to borrow from
           j = i + 1
           while ((xj == 0) and (j < N)) {
              j++
           }
           // do the borrowing
           if (j == N) {
              borrow = 1
              j = j - 1
              xj = xj + 10
           }
           while (j > i) {
             xj = xj - 1
             j--
             xj = xj + 10
           }
           differencei = xi - yi
        }
    }
    

    Figure 3.5.1
    Figure 3.5.1 shows integer values 0-7 as 3 nested rings: one ring shows the bit pattern, one ring shows what the bit pattern represents as an unsigned integer, and one ring shows what the bit pattern represents as a signed (two's complement) integer.

    The table below shows the same information, but in tabular form
    Bit patternUnsigned intSigned int
    000 0 0
    001 +1 1
    010 +2 2
    011 +3 3
    100 +4 -4
    101 +5 -3
    110 +6 -2
    111 +7 -1


    Chapter 4

    Figure 4.1.3
    This figure shows four layers: the application, the C I/O libraries, the OS, and the screen/keyboard.
    Interaction between the four layers is depicted using arrows as follows:

    Algorithm 4.2.1
    value = 0
    while (there are more characters in the buffer) {
       shift value left 4 bits
       temp = new char converted to int
       value = value + temp
    }
    


    Chapter 8

    Figure 8.1.1
    This figure depicts the components of the CPU, with all of them connected by a shared internal bus.
    The components are: The bus interface component also connects to the external buses - i.e. the address bus, data bus, and control bus.

    Figure 8.2.3
    This figure depics the format of the program status register, labelling the 32 bits.
    The 32 bits, from most to least significant (left to right) are labelled as follows:

    Figure 8.4.1
    This figure shows a flow chart for the execution cycle, which we can express as
    repeat:
    
       fetch the instruction pointed to by the program counter
    
       add the number of bytes in the instruction to the program counter
    
       execute the instruction
    
    until it is the wfi instruction
    
    (then idle the CPU)
    


    Chapter 9

    In chapter 9 quite a number of different kinds of assembly language instruction are introduced, but many of these we won't use for some time yet. The instructions generally fall into one of several categories:

    9.2.2 Condition Codes:
    The condition codes look at what happened in the previous instruction and allow you to do something based on the result. For instance, if the previous instruction compared two values, we might want to do something now if those two values were equal.
    These condition codes are two-letter extensions to other instructions, for instance instead of a simple ADD instruction we can add any of the condition codes to create ADDEQ, ADDNE, ADDCS, etc.

    9.2.3 Shift Options:
    These options allow us to shift a value by a certain number of bits before we use it, so the shift options are added to the end of some other instruction and specify the nature of the shift (logical, arithmetic, or rotate) and whether we're shifting left or right.

    9.2.4 First Instructions:
    In this section a lot of variants are introduced for some basic instructions. Many of the variations we won't use for some time yet, so here is a simplified breakdown of the instructions:


    Chapter 10

    This is just meant as an alternative/additional explanation of the way run-time stacks work in most programming languages.

    Each call to a function needs someplace in memory to store its parameters and its local variables.

    While we can use registers for short term storage, there are only a small number of them so it isn't practical to rely on them alone as an all-purpose tool for parameters and locals. For instance, if functions f and g each use the same registers to store their local variables, and f calls g, then g might be wiping out the data f was using.

    As a result, we need to come up with some kind of general scheme we can use to safely and easily access local variables. Ideally, we would also like to remember what was in each register when a function call was made, so those values can be restored when the function call completes.

    A common practice is to use a special stack, called the run-time stack, as a place to store a function's local variables, the parameters passed to it, the value it will return to the caller, and also to save copies of what the various register values were right before the call.

    Whenever a function call is made, a spot for the return value is pushed on the stack, the return address is saved on the stack (i.e. where to resume execution after the function call ends), the old register values get saved (pushed on the stack), the function parameters are pushed on the stack, space for the local variables is pushed on the stack, and the program counter is loaded with the address of the first instruction of the function.

    When the function call ends, we can pop off the space for the local variables, restore the register values (popping them off the stack), pop the parameters off the stack, pop the return address off the stack into the program counter, then the caller can get the return value off the top of the stack and resume running.

    In high level languages the compiler inserts instructions to do all this for us, but in assembly language we must do it ourselves. In doing so, we make use of most of the available registers:

    One thing to note when working with our run-time stack: as the stack grows the memory addresses go down, not up. E.g. the bottom of the stack might be memory address 6000 (hex), then if we push 8 bytes onto the stack the new top of stack address will be 5998, if we push 4 more bytes the new top will be 5994, etc.

    The reason for this is to try and be as flexible as possible in the division of memory between the run-time stack and the heap (the section of memory used for dynamic memory allocation). If we set aside memory addresses 0000-6000 for the stack and heap, we'll usually have the heap start at address 0000 and grow up, while the stack starts at address 6000 and grows down. Thus they can each grow/shrink freely as long as there is still some free space in between them.

    Function calls and returns: the details

    Expanding on the concept above, the assembly language for a function call looks like this:
    @ in this example we're passing value 3, char 'x', and the contents of register r8
    @    to the function we're calling, foo
    
    @ put the parameters in registers r0, r1, r2
            mov             r0, 3
            mov             r1, 'x'
            mov             r2, r8
    @ call the function
            bl              foo
    @ once the function completes we'll be back here,
    @    and the return value will be in register r0
    

    Then, in the function called, we would see the following:

    @ declare the globally accessible function foo
            .global         foo
            .type           foo, %function
    foo:
            stmfd           sp!, {fp, lr}  @ saves the return address and old frame pointer
            add             fp, sp, 4      @ set our frame pointer to point at the return address
            @ at this point the three parameters are in r0, r1, r2,
            @    we can use them however we want
            @ once we're done, put the desired return value in r0, e.g.
            mov             r0, 3
            @ restore the caller's fp and sp
            ldmfd           sp!, {fp, lr}
            @ return to the caller, using the address in the link register
            bx              lr
    

    The layout above works fine, but doesn't account for local variables. After we have set the frame pointer to point at the return address, the frame pointer, fp, points at the return address, and the old frame pointer is on top of that, on the top of the stack (with the stack pointer pointing there).

    If we need N bytes set aside on the stack for local variables, we must calculate where they will be relative to the frame pointer, set aside sufficient space on the stack, and clean that space up before we restore the caller's fp and sp.

    If we were to use two ints for local variables, we would need 8 bytes of stack space for each, i.e. we could subtract 8 from the sp, and use offsets of -4 and -8 from the frame pointer to access them.

    There is a catch when calculating the amount of stack space to set aside for locals however. Due to memory alignment requirements in the processor, the value in the stack pointer must always be divisible by 8. This means that, however much space we need for local variables, we round that amount up to a value divisible by 8. E.g. if all we needed was a single char we would still round up to 8, if we needed was a char and an int (5 bytes) we would again round up to 8.

    Once we have allocated the N bytes of local variable space we can make up our own minds about which specific chunks of that we will use for our local variables.

    The example below has us set aside space for a local int variable and a local char variable, hence adjusting the stack by 8 bytes once we round up, and using offsets -4 and -8 from the frame pointer to access them.

    @ declare the globally accessible function foo
            .global         foo
            .type           foo, %function
    foo:
            stmfd           sp!, {fp, lr}  @ saves the return address and old frame pointer
            add             fp, sp, 4      @ set our frame pointer to point at the return address
    
            sub             sp, sp, 8      @ set aside 8 bytes of local variable space
            @ at this point the three parameters are in r0, r1, r2,
            @    fp-4 is the location of our local int,
            @    fp-8 is the location of our local char,
            @    and there are three unused bytes on the top of the stack
    
            @ below we try a few sample instructions accessing the parameters and local variables
            str             r0, [fp, -4]   @ copy the first parameter into our local int
            str             r1, [fp, -8]   @ copy the second parameter into our local char
            add             r2, [fp, -4]   @ add the local variable to register 2 (the third parameter)
    
            @ once we're done, put the desired return value in r0, e.g.
            mov             r0, r2
    
            @ clean the local variable space off the stack
            add             sp, sp, 8
    
            @ restore the caller's fp and sp
            ldmfd           sp!, {fp, lr}
    
            @ return to the caller, using the address in the link register
            bx              lr
    

    Finally, since the various offsets and byte counts can get confusing after a bit, it helps to define names for the offsets we use. This is usually done with the .equ statement, e.g. to use the word locals instead of the value 8 in our code, we could add the statement
    .equ locals,8

    In the code below we rewrite the foo function above, but using some equ definitions for the amount of local variable space and the offsets to our local variables, intI and charC:

    @ declare the globally accessible function foo
            .global         foo
            .type           foo, %function
            .equ            intI, -4
            .equ            charC, -8
            .equ            locals, 8
    foo:
            stmfd           sp!, {fp, lr}  @ saves the return address and old frame pointer
            add             fp, sp, 4      @ set our frame pointer to point at the return address
    
            sub             sp, sp, locals @ set aside 8 bytes of local variable space
    
            @ at this point the three parameters are in r0, r1, r2,
            @    fp-4 is the location of our local int,
            @    fp-8 is the location of our local char,
            @    and there are three unused bytes on the top of the stack
            str             r0, [fp, intI]   @ copy the first parameter into our local int
            str             r1, [fp, charC]  @ copy the second parameter into our local char
            add             r2, [fp, intI]   @ add the local variable to register 2
    
            @ once we're done, put the desired return value in r0, e.g.
            mov             r0, r2
    
            @ clean the local variable space off the stack
            add             sp, sp, locals
    
            @ restore the caller's fp and sp
            ldmfd           sp!, {fp, lr}
    
            @ return to the caller, using the address in the link register
            bx              lr
    

    Here is one final note with respect to saving and restoring the caller's registers:

    it is assumed registers r0-r3 are used to pass parameters, and can freely be overwritten by the called function. However, any other registers (r4-r9) that the called function choses to use should also be saved and restored by the called function, and for each extra register saved we must adjust the fp by an additional 4 bytes to get it to point at the return address.

    For example, in foo above we saved/restored fp and lr and tweaked the fp by 4 bytes:

            stmfd           sp!, {fp, lr}
            add             fp, sp, 4
    
            ldmfd           sp!, {fp, lr}
            bx              lr
    
    If foo had also chosen to use registers 4 and 5 internally, then it should include them in this process, i.e. store and load them, and add an extra 8 bytes to the fp:
            stmfd           sp!, {r4, r5, fp, lr}
            add             fp, sp, 12
    
            ldmfd           sp!, {r4, r5, fp, lr}
            bx              lr
    

    Text strings in assembler

    If we want to use text strings in our assembly language programs, e.g. passing them to write or printf or similar functions, then we need to jump through some extra hoops.

    First, we need to define the string and determine how long it is. This can be done by creating a read-only data section (since our string literal is a constant data element), placing a label before the start of our string, adding a directive to declare the string content, then computing the length afterwards and labelling the value with an equ statement.

    There are a number of directives for definining text strings, here we'll use .asciz, which specifies an ascii string and inserts a null terminator for us, e.g.

            .section        .rodata
            .align          2
    
    myText:
            .asciz          "Blah blah blah"
            .equ            myTextLen,.-myText   @ compute # bytes from label to here
    
    However, we now need to be able to store the address myText is located in, which involves setting up a word of storage and loading it with the address of myText, e.g.
    myTextAddr:
            .word           myTexta
    
    With the length, myTextLen, and the address, myTextAddr, we can manage to pass the appropriate data as parameters, e.g. for a call to write:
            mov             r0, 1                 @ first parameter specifies write to stdout
            ldr             r1, myTextAddr        @ second parameter is the string address
            mov             r2, myTextLen         @ third parameter is the number of bytes
            bl              write                 @ then call the function
    


    Chapter 11

    An overview of most of the assembly language instructions used here is given in the chapter 9 notes above, though it is worth noting that there are quite a number of additional data instructions (e.g. AND, ORR, EOR for logical and, or, exclusive-or).

    Here we discuss the translation of assembler to machine code a bit more. Again, the concepts are important, memorization of the translation patterns is not.

    For every possible assembly language instruction, including the values of its operands, there is one unique 32-bit pattern to represent that instruction.

    If we consider the 32 bits labeled left-to-right as bit 31, bit 30, bit 29, ... bit 0, we find the following:

    Just for interest's sake, there is an online ARM to Hex converter available at armconveter.com, in which you type in an assembly language instruction and it provides the resulting maching code (in hex).


    Chapter 12

    In this chapter we focus on if/else statements and on loops.

    For both of these, the key steps are to use either CMP or TST to compare two values, followed by one of the branch instructions (BEQ, BNE, BLT, BGT, etc) that jumps to a different code location if the condition tested was true.

    CMP performs an arithmetic comparison of two values, while TST performs a logical comparison of two values. (TST is like comparing unsigned values.)

    For example, to compare the contents of registers 1 and 2, and jump to code label FOO if they are equal, we could use the following:

            CMP     r1, r2
            BEQ     FOO
    
    Similarly, to compare r3 and r4 and jump if r3 > r4, we could use
            CMP     r3, r4
            BGT     FOO
    
    The unconditional branch instruction (jump no matter what) is simply
            B       FOO
    
    The branch instruction when we actually want to call a function (e.g. prt) is
            BL      PRT
    
    and the branch instruction when we want to return from a function is
            BX      LR
    
    (LR is the link register, which contains the return address in the caller.)

    Branching to emulate if/else statements

    In assembly language, suppose we want to do something like:

    if (x < y) {
       // body
    }
    
    We can do this by comparing x and y, and putting in a branch statement that jumps past the actions in the body if x >= y. Here I'm going to assume the values of x and y are actually in registers 3 and 4:
            CMP     r3, r4  @ compare x and y
            BGE     skipped @ jump if x >= y
            @ the actions in the body would be here
    skipped:
            @ the rest of the code would be here
    
    Thus is r3 >= 4 we jump past the body of the if statement, otherwise we go ahead and do it.

    Suppose we now wanted to try an if/else configuration:

    if (x < y) {
       // part 1
    } else {
       // part 2
    }
    
    We can follow the same idea as before, but if we do part 1 then we must remember to skip over part 2 when we get there, e.g.
            CMP     r3, r4  @ compare x and y
            BGE     else    @ jump to the else if x >= y
            @ the actions in part 1 would be here
            B       rest    @ skip over the else section
    else:
            @ the actions in part 2 would be here
    rest:
            @ the rest of the code would be here
    

    Branching to emulate loops

    We can follow a similar approach with loops. For instance, if we want a top-tested loop like a while loop:
    (i) have a label at the top of the loop and another after the end of the loop
    (ii) do a compare at the top of the loop
    (iii) if it's time to leave the loop then branch to the label after the loop body
    (iv) otherwise just continue on, doing the body of the loop
    (v) at the bottom of the loop branch back to the comparison For instance, if our loop is

    while (x <= y) {
       x++;
    }
    
    Then we could set up corresponding assembler as follows (again assuming here we have the two values in registers 3 and 4):
    while:
            @ loop termination test
            cmp     r3, r4         @ compare x and y
            bgt     done           @ quit if x > y
    
            @ loop body
            add     r3, 1          @ increment x
    
            b       while          @ go back to the top for the next pass
    
    done:
    
    If we were to use a bottom tested loop, like a do-while loop, we would simply move the continuation test to the bottom - if the test passes then branch to the top, otherwise don't. Consider this C code:
    do {
       x++;
    } while (x <= y);
    
    A corresponding assembly segment might look like this:
    dowhile:
    
            @ loop body
            add     r3, 1          @ increment x
    
            @ loop continuation test
            cmp     r3, r4         @ compare x and y
            ble     dowhile        @ repeat if x <= y
    
    If we wanted to have a for loop there is an initialization segment, a termination test, a loop body, and an update statement, each of which needs their own chunk of assembler.
    for (x = 0; x < y; x++) {
        // body of the loop
    }
    
    The assembler might look like:
            @ initialization
            mov     r3, 0           @ x = 0
    for:
            @ loop termination test
            cmp     r3, r4          @ compare x and y
            bge     done            @ quit if x >= y
    
            @ the loop body would go here
    
            @ update statement
            add     r3, 1           @ x++
    
            @ repeat for next pass
            b       for
    
    done:
            @ rest of the program
    


    Chapter 13

    In C++, we tend to pass parameters to functions in one of several ways: by value, by reference, or by passing a pointer to the value.

    Most assembly languages follow a similar sort of pattern, though with a variety of ways of accomplishing them:

    In ARM assembler, when passing parameters it is assumed the first four parameters are placed in registers r0-r3, respectively, and any remaining parameters are placed on the stack.

    In the textbook in listing 13.2.3, an example is shown that passes a total of nine parameters (4 in the registers, 5 on the stack), to illustrate this mixed mode.

    Note that the calling function puts the parameters on the stack before the call, and then after the call returns it cleans that space off the stack (by adding the total size of the parameters to the stack pointer).

    The caller accesses the parameters as offsets from its frame pointer, noting that the parameters and local variables are on opposite "sides" of the frame pointer (the locals at lower addresses, the parameters at higher addresses).

    This isn't something we need to do often, as four parameters are usually enough, but it's important to know it can be achieved if needed. (Especially since in some other assembly languages it is assumed all parameters are passed on the stack.)

    The three figures in chapter 13 can be summarized as follows:

    Figure 13.2.4

    This figure just depicts the locations of parameters e through i on the stack at the moment the function call is made.

    Parameter e is at the top of the stack (i.e. the address in sp), while the remaining parameters are each 4 bytes deeper - i.e. at addresses sp+4, sp+8, sp+12, and sp+16.

    Figure 13.2.5

    This figure just depicts the locations of parameters e through i on the stack after the function has saved the caller's info (the stmfd) and adjusted the frame pointer.

    Now, sp points at the caller's fp on the top of the stack, fp points at the return address (at sp+4), and the parameters are each in 4 byte increments below that. Hence e is at fp+4, f is at fp+8, g is at fp+12, etc.

    Figure 13.2.6

    This figure summarizes the memory layout for a function after any local variables have been allocated. In essence, fp points at the return address, the caller's fp is stored on top of that (i.e. at fp-4) and any local variables are on top of that (e.g. at fp-8, fp-12, etc), while the parameters are below the return address (e.g. at fp-4, fp-8, etc).


    Chapter 14

    Chapter 14 introduces a number of other logical and arithmetic instructions that are available, such as logical and/or/exclusive-or, bit shifts, multiplication and division.

    Most of these operate like the ADD/SUB instructions discussed earlier, but it is worth noting that the different multiplication operations can be used to create different sized results:

    Division has signed and unsigned versions, but not the 64-bit version:


    Chapter 15

    In this chapter we briefly explore how C structs and arrays are stored as local variables in memory, how they are passed as parameters, and how we can work with them in assembler.

    The intent is to focus on just sections 15.1 and 15.2.


    Chapter 16

    In this chapter we are primarily interested in the storage of floating point values in C. This differs significantly from the two's complement representation of signed integers, so needs some discussion. Sections 16.1-16.3 cover general ideas, and 16.4 covers the specific bit representation used (IEEE 754).

    For a code-based look at how floating point values are stored in (following the IEEE 754 format), see the comments and code in the program below (which we'll use in lab 10).

    #include <stdio.h>
    
    // a short program to show the way floating point
    //   values are stored in memory
    
    // floating point numbers are stored in a form of binary exponential notation,
    //    e.g.  9.5 in decimal would be 1001.1 in fixed point binary
    //          but 1.0011 x 2^3 in binary exponential notation
    //    in fact, this is stored in a somewhat modified form,
    //          as described below
    //
    // bit 0: sign 0=+, 1=-
    // bits 1-8: exponent, e=0-255
    //    to determine the number of bits to shift left/right use e-127
    //       if the result is negative then shift that many bits left,
    //       if the result is positive then shift that many bits right,
    //       if the result is zero then don't shift
    // bits 9-31: mantissa, m
    //    the bit representation of the intended value as a 24-bit unsigned int,
    //       except the leading 1 in the representation is omitted, the remaining
    //       23 bits are stored
    //    e.g.  110011001100110011001100 would be stored as 10011001100110011001100
    //       thus effectively allowing us to precisely store values up to (2^24)-1
    //          instead of (2^23)-1
    //
    // Special cases:
    //    the value 0 is represented as all 0's
    //    the values +/- infinity are represented by setting
    //        all the exponent bits to 1 and the mantissa bits to 0
    //    the "Not a Number" value (NaN) is represented by setting
    //        all the exponent bits to 1 and the mantissa to anything except 0
    //
    
    int main()
    {
       // use a union to store the value as a float but examine the stored bits as an unsigned int
       union showFlt {
          unsigned int i;
          float f;
       } val;
    
       // local vars for the components of the float
       unsigned int sign, exponent, mantissa;
    
       // local vars for extracting individual bits
       unsigned int i, mask;
    
       // obtain and echo the desired float
       printf("\nPlease enter a floating point value: ");
       scanf("%g", &(val.f));
       printf("\nThe stored (hex) pattern for %g is %x\n\n", val.f, val.i);
    
       // extract and display the sign bit and its meaning
       sign = val.i & 0x80000000;
       printf("   stored sign (in bit 0): %x, i.e. ", sign);
       sign = sign >> 31;
       printf("%c\n\n", (sign==0)?'+':'-');
    
       // extract and display the mantissa bits,
       //    including the implied 1 at the beginning
       mantissa = val.i & 0x007fffff;
       printf("   mantissa (in binary, with implied leading 1 inserted)\n");
       printf("      1.");
       mask = 0x00400000;
       for (i = 0; i < 23; i++) {
           if (mantissa & mask) printf("1");
           else printf("0");
           mask = mask >> 1;
       }
       printf("\n\n");
    
       // extract and display the exponent bits,
       //    and the meaning in terms of shifting
       exponent = val.i & 0x7f800000;
       mask = 0x40000000;
       printf("   stored exponent (in bits 1-8): ");
       for (i = 0; i < 8; i++) {
           if (exponent & mask) printf("1");
           else printf("0");
           mask = mask >> 1;
       }
       printf(", i.e. ");
       int exp = (exponent >> 23) - 127;
       if (exp == 0) {
          printf(" no shift\n");
       } else if (exp < 0) {
          printf(" shift decimal left %i bits\n", -exp);
       } else {
          printf(" shift decimal right %i bits\n", exp);
       }
       printf("\n");
    
       return 0;
    }
    


    Chapter 17

    Chapter 17 is largely informative, providing background on the nature of supervisor calls and interrupts.


    Chapter 18

    As with chapter 17, this chapter is largely background/informative, introducing the device timing and interface concepts programmers need to be aware of when working at the device level.

    Note that figure 18.3.1 simply shows a memory controller at the centre of the diagram, with connections from it to each of: the CPU, memory, the graphics processor, and the I/O controller.