CSCI 265 notes:

Verification and validation are concerned with assuring that a software system meets a user's needs.
It takes place throughout the software lifecycle with two main objectives:
- The discovery of defects in a system.
- The assessment of whether or not the system is usable in an operational situation.
The two V's are not the same thing:
- Validation: are we building the right product?
- Verification: are we building the product right?
I.e. validation is ensuring we've got the right specifications, then verification checks that we're meeting those specifications
Dynamic vs static V&V
- dynamic validation and verification: is concerned with exercising and observing the product behaviour (statistical testing for system performance, defect testing for debugging)
- static verification: is concerned with analysis of the static system representation (inspection of documentation and source code)
  GET USED TO THE IDEA OF CODE INSPECTIONS, for format, ideas, and correctness. We'll consider formal inspection processes in about a week.
Program testing
- Testing can reveal the presence of errors NOT their absence.
- A "good" test is one that has a high probability of revealing an otherwise undetected error.
- A successful test is a test which discovers one or more errors.
- Tests are the only validation technique for non-functional requirements - you cannot determine whether code runs quickly enough, reliably enough, etc without running it.
- Program testing should be used in conjunction with static verification.
- FINDING ERRORS DURING TESTING IS A GOOD THING - or, at least, is much better than letting the user find them later!
- Testing should be traceable to the customer's requirements - there should be a reason behind every test you apply
The software should be designed to be easily testable:
- the software design and the inter-relationship of system components should be well understood
- the technical documentation should be well organized, accessible, detailed, and accurate
- the decomposition of the software into modules or objects should facilitate compartmentalized testing - we should be able to easily and effectively test individual parts as well as the system as a whole
- the software should be highly controllable, in that we can easily generate any possible output, and easily force the program into any desired state
- the software should be highly observable, in that we can easily distinguish between program states, easily observe all factors influencing the system output, and easily distinguish correct from incorrect behavior/output
- changes to the software should be infrequent, well-controlled, and should not invalidate existing tests
- the tests themselves should be easy to specify, easy to automate, and easy to reproduce
Testing and debugging
- The two are distinct processes.
- Defect testing: is confirming the presence of errors.
- Debugging: is locating and repairing these errors.
- Effective debugging involves formulating a hypothesis about program behaviour then testing the hypothesis to find the system error:
  - Locate error.
  - Design error repair.
  - Repair error.
  - Retest program.
Stages of testing through implementation
- Interface testing: testing the passing of parameters and correct communication between modules and units within modules. In "top-down" testing this is begun before the bodies of the units are actually designed.
- Unit testing: testing of individual components. This involves the use of drivers to substitute for routines invoking the unit under test, and stubs to substitute for routines called by the unit under test.
- Module testing: testing collections of dependent components. This involves the use of drivers to substitute for routines invoking the module under test, and stubs to substitute for any called routines which are external to the module under test.
- Sub-system testing: testing collections of modules integrated into sub-systems, again using stubs and drivers to substitute for any routines external to the subsystem.
- System testing: testing the complete system prior to delivery.
- Acceptance testing: testing by users to check that the system satisfies requirements (sometimes called alpha testing).
- Sometimes these terms are combined to give component testing (unit and module testing) and integration testing (sub-system and system testing).
- Beta-testing is the shipment of the tested product to customers for trials in a real working environment
Test planning and scheduling - for complex systems, testing can easily absorb half the development budget, so must be well planned
- Describe major phases of the testing process.
- Describe traceability of tests to requirements.
- Estimate overall schedule and resource allocation.
- Describe relationship with other project plans.
- Describe recording method for test results.
Testing strategies
- Top-down or bottom-up.
- Stress testing.
- Back-to-back testing.
- Specialised testing strategies: e.g., thread testing for real-time systems.
Top-down vs bottom up testing
- test high levels first or last?
- top down allows testing to begin very early (i.e. as soon as top levels laid out), thus detects some problems very early
  as each level of decomposition is determined, the interfaces between modules are tested (although the bodies of the modules do not yet exist)
- bottom-up cannot begin until bottom levels designed, but more thorough testing of components can then begin (for top-down it's difficult to devise accurate program stubs for a complex system)
Incremental testing
- Integration testing should always be incremental.
- The idea is to begin with a very simple implementation and proceed with the following cycle:
  - add a limited amount of functionality to the existing code
  - carry out testing, debugging, and correction until the existing functionality is correct
- Whichever errors are detected at a given stage are generally in the new component (indicating an error missed during component testing) or in the interaction between that component and existing components (possibly a component error or an interface error).
Stress testing
- In stress testing we exercise the system beyond its maximum required level of performance - effectively determining what its capabilities are.
- This may involve testing with high network loads, testing with large data sets, testing with heavily flawed data, etc.
- Stress testing causes the failure behaviour of the software to be invoked and hence tested - the system should never fail catastrophically, so stress testing checks for unacceptable loss of service or data.
- Stress testingis particularly relevant for distributed systems which can exhibit severe degradation as a network becomes overloaded.
Back-to-back testing
- This involves running multiple versions of the software on the same data and seeing if they all produce the same result - a fast way of confirming correct or incorrect behavior in one or more of the versions being compared
- Present the same tests to different versions of the system and compare outputs. Different outputs imply potential problems.
- Reduces costs of examining test results; permits automatic comparison of outputs in many cases.
- Back to back testing requires access to multiple working versions of the software, possibly using a prototype, comparing against older versions, comparing against versions developed for other platforms, etc

Defect Testing

the process of testing to identify program defects

there are three typical approaches:

functional (black-box) testing - based on system specifications
structural (glass-box) testing - based on knowledge of the implementation
interface testing - (OO systems) - based on system specifications plus knowledge of the internal interfaces

Testing Team         Development Team
  |        \          /      |
  |         \        /       |
Functional  Interface    Structural
 Testing     Testing      Testing
   |            |            |
   |            |            |
 System     Sub-system   Unit/module
  Level       Level        Level

Functional testing
- Here the program is treated as a black box
  - we provide input data and observe the output data
  - inputs are chosen and outputs predicted based solely on the system specifications
- the key problem is, from the specifications (for a component, a module, a subsystem, or the entire product) to predict input conditions that are likley to provoke anomalous behavior
- some of the things we seek to test in a black box approach may be
  - incorrect or missing functions
  - interface errors
  - data structure errors
  - data base errors
  - performance errors
  - initialization or termination errors
- BB testing is often based on attempting to identify classes of input/output data, and pick representative data from each class (e.g. positive integers, negative integers, strings, etc)
  plus trying to pick unusual cases and boundary conditions from each class (e.g. 0, zero-length strings, etc)
- we can also test for incorrect data types/ranges being entered, incorrect file formats, missing data, etc
Structural testing
- Here the tester has full knowledge of the program implementation, and tailors the tests to try to exploit that knowledge
- Path testing: try to create different test data to exercise every possible path in the program (i.e. over the set of tests every line of code should be executed at least once)
  This can be done using a flow graph, showing the possible sequences of execution of code, and then tailoring a test set for every path
```
Line   Code for program foo
1      void main(int argc, char *argv[]) {
2         int k;
3         if (argc == 1) 
4            printf("Too few parameters\n");
5            else {
6               switch (argv[1][0]) {
7                  case '0': k = 1; break;
8                  case '1': k = 2; break;
9                  default:  k = 0;
10              }
11           }
12         }      
13      }  

Flow graph
      4------------12--13
     /             /
1-2-3     7       /
     \   / \     /
      5-6-8-10-11
         \ /
          9

Possible paths:             Provoking input
     1-2-3-4-12-13          foo
     1-2-3-5-6-7-10-11-13   foo 0
     1-2-3-5-6-8-10-11-13   foo 1
     1-2-3-5-6-9-10-11-13   foo 2
```
- The number of paths in the graph is simply the number of Boolean conditionals checked in the code (i.e. each if condition, loop test, and each case of a switch statement)
- Note that glass-box testing only attempts to ensure that each code line is executed at least once - it does not test every possible combination of paths
- Here are some ideas to consider when testing:
  - Do all the parameter match, in number, type and order?
  - Are shared variable definitions consistent across modules?
  - Are all file attributes, open/close statements correct, and are files opened before used, closed when finished, end-of-file adequately handled?
  - Are all data structures properly and consistently typed, and initialized?
  - Are all operations checked for overflow and underflow?
  - Are all addressing exceptions adequately handled?
  - Testing conditional expressions (Booleans), such as the test cases for if statements: apply test cases which will cause the expression to evaluate to true, and test cases which will cause the expression to evaluate to false
  - Testing loops: create test cases which will cause the loop not to execute at all, cases which will cause the loop to execute exactly once, and cases which will cause the loop to execute the maximum acceptable number of times
Interface testing
- There are three main categories of inteface errors:
  - interface misuse: parameters are passed in an incorrect order, or parameters of an incorrect type are used
  - interface misunderstanding: the calling routine includes incorrect assumptions about the specifications of the called routine (e.g. calling binary search on an unsorted array)
  - timing errors: primarily for shared data in real time systems, where the producer and consumer of data are operating at different speeds, allowing the consumer to access out-of-date information
- Some general guidelines for interface testing are:
  - explicitly list each call to an external component, and design tests for each using the extremes of the allowable value ranges
  - whereever pointers are passed, always test with NULL pointers
  - stress-test message passing systems (generate many more messages than are ever likely to occur in practice - may reveal timing flaws)
  - for shared-memory systems, design tests which vary the order in which different components are activated (may reveal implicit assumptions by the programmers about what order things happen in)
  - apply tests which are supposed to cause the component to fail, then observe behavior (may reveal misunderstandings about default behavior or error handling)
- For GUIs, the things we test may include:
  - is all information within the window clearly readable, on color monitors, on grayscale, on different resolutions, etc
  - can the windows be resized, moved, scrolled, refreshed, hidden and recalled
  - does the name of the window and all menus appear properly
  - can all the data contents be accessed with a mouse, with the function keys, with the arrows, and with keyboard control
  - do all the tool bars and pull down menus work properly
  - is help available for each item
  - are all labels clear and self-explanatory
  - is incorrect data recognized, and reasonable response messages/actions provided/taken

Test plans

In the test plan, as early as possible in the software development life cycle, we are trying to identify the mechanism to be used to ensure the customer receives a quality, tested product.

A possible test specification might be laid out something like (see Pressman p. 504)

I.   Scope of Testing 
        (summarizing functional, performance, and
         internal design characteristics to be tested,
         plus schedule constraints, completion criteria, etc)
II.  Test plan
        (provides the overall strategy for test integration)
     A. Test phases and builds
     B. Schedule
     C. Overhead software
     D. Environment and resources
III. Test procedures
        (described separately for each phase/build)
     A. Order of integration
        - purpose, and modules to be tested
     B. Unit tests
        - description of tests and expected results
        - software overhead (e.g. stubs/drivers)
     C. Test environment
        - special tools, techniques
        - software overhead (e.g. stubs/drivers)
     D. Test case data
     E. Expected results
IV.  Actual test results
        (again supplied for each phase/build)
V.   Supporting material: references, appendices etc

Static Verification

Static verification attempts to check a program against its specifications without executing the program
We address several techniques in static verification
- Program inspections
- Mathematical verification
- Static analysis tools
Program inspections
- Formalized inspections have been part of large-scale software development projects since the 1970s
- A small team systematically analyzes the code in an attempt to identify possible defects
- Most inspection teams have individuals taking on the following roles:
  - Author/owner - responsible for fixing defects identified during inspection
  - Inspector(s) - Finds errors/omissions/inconsistencies
  - Reader - paraphrases the code or document at the inspection meeting
  - Scribe - records the results of the meeting
  - Moderator - manages the process and subsequently follows up on any actions decided by the inspection team
- Recommended preparation for the inspection process includes:
  - ensuring that there is a precise specification of the code to be inspected
  - ensure that the inspection team is trained in carrying out inspections and is familiar with all applicable company standards
  - ensure that the code under inspection is the most up-to-date, syntactically correct version of the code available
  - preparing and maintaining checklists of likely errors/problems
  - accept that static verification will increase costs during early project stages (in hopes of reducing debugging/testing costs later)
  - clarify that the inspection process is geared towards improving the process/product quality, it is NOT an appraisal of the personnel involved and code inspections should NOT be a part of career reviews
  - anticipate that each team member will probably spend as much time preparing for the inspection as they will participating in the inspection
  - while rates of inspection vary significantly, inspecting roughly 100 code statements per hour is not unusual
- Checklists for inspections
  - the specific faults which the team needs to search for depend partially upon what is automatically tested by the development environment: e.g. Ada compilers will test for correct number of parameters, C compilers will not
  - Typical checks include:
    - are all identifiers appropriately named
    - are all variables initialized before being used
    - is each variable actually used
    - are all output values set before being output
    - are array bounds tested
    - is each conditional statement correct
    - is each loop guaranteed to terminate
    - is each possible case accounted for in case/switch/if-else statements
    - is bracketing or begin/ends correct
    - are the number, type, and order of parameters correct in each function call
    - are all links correctly managed
    - is all dynamic storage allocation correctly allocated, linked, and deallocated
    - have all possible error conditions been accounted for
Mathematical verification
- Formal verification involves proving, using mathematical arguments, that a program meets its specifications.
- Formal verification requires that the semantics of the programming language are formally defined, and that the program must be formally specified in a notation consistent with the verification technique to be applied
- Unfortunately, most programming languages do not have formally defined semantics, so it is usually not possible to formally prove a program correct
- Formal proofs are therefor usually applied to parts of a program, where the program part uses only a subset of the programming language, and where that subset of the language does have formally defined semantics
Static analysis tools
- there are a number of static analysis software tools (such as lint) which scan the source code of a program and detect possible faults and anomalies
- some of the analysis these tools carry out includes:
  - control flow analysis: identifies unreachable code
  - data flow analysis: identifies variables which are assigned values that are not subsequently used, and variables which are used before being assigned a value
  - interface analysis: identifies inconsistencies in parameter declarations and use, including return values
  - information flow analysis: identifies the set of input variables on which each output variable depends (not an anomaly by itself, but an aid to identifying the possible sources of other detected anomalies)
  - path analysis: identifies all possible execution paths (again allowing easier analysis in other contexts)
Clean-room approaches
- The goal of a clean-room approach is to develop zero-defect software
- The process involves three teams:
  - The specification team
  - The development team
  - The testing team
- The approach is characterized by five aspects:
  - formal specification, using a stimulus-response model
  - incremental development
  - structured programming, carried out through stepwise refinement of the specifications
  - static verification, using mathematically-based correctness arguments
  - statistical testing, based on a predetermined operational profile