CSCI 265: Fall Final Exam Sample Solutions
Sections F16N01-F16N02 (Dr. Wessels)

Question Mark
1. Inspections and version control
[12 marks]
 
2. Perl Test
[12 marks]
 
3. Perl scripting
[12 marks]
 
4. Bash scripting
[12 marks]
 
5. Makefiles, compilation, and version control
[12 marks]
 
6. Profiling: gprof
[12 marks]
 
7. Specifications, testing, and life cycles
[12 marks]
 
8. Regular expressions
[12 marks]
 
9. Design choices
[12 marks]
 
Exam Total

[96 marks]

 

Question 1: Inspections and version control [12]

One of the code inspection issues we discussed dealt with keeping track of discussion points during asynchronous code inspections. If we allow members of the inspection team to discuss the item under inspection through email, messaging, voice mail, etc, then ensuring all members of the team have the same information about the discussions can be a challenge.

One solution is to include all inspection discussion content in our version control process. Your task is to detail a process a team could follow that permits asynchronous inspections, but keeps the "discussion" content under version control. Be sure to discuss the strengths and weaknesses of your proposed process.
Sample solution
Process:
   When posting general observations/discussion points, team
      members are expected to post them to the shared document
          INSERT DOCUMENT LOCATION/ID HERE
   All team members are expected to check that document before
      contacting any other inspection team members with questions.
   If members of the inspection team discuss the material with
      each other (by phone, email, chat, etc) the person who
      initiated the discussion is responsible for posting the
      salient points (especially any action items) to the shared
      document before the end of that working day.
   Each morning, the lead of the inspection team will review the
      previous day's changes to the shared document, and ensure the
      latest copy is placed in the version control repository with
      an appropriate commit message.

Strengths:
   - ensures the communication is under version control
   - assigns direct responsibility for ensuring the content
     is updated on the shared document and for checking said
     document before initiating new discussions
   - ensures the document is reviewed daily by the lead

Weaknesses:
   - relies on team members to remember to update the shared
     document, and to remember (take notes) for all the key points
     from any conversations with other inpection team members
   - relies on team members to check the shared document before
     initiating new discussions
   - requires the lead to spend the beginning of each day on
     inspection administration

Question 2: The Perl Test module[12]

Using Perl's Test module (either Simple or More) write a perl script, complete with test cases, to test the perl function whose prototype is shown below.

# inRange(lo, hi, val)
# --------------------
# inRange returns true (1) if lo, hi, and val are all numeric values
#    where lo <= val <= hi
# returns false (0) otherwise
sub inRange($$$);
Assume the function is provided in module mathstuff.pm.
Sample solution
#! /usr/bin/perl

use strict;
use warnings;
use mathstuff;

# the collection of test cases, each of form
#    [ lo, hi, val, expected, casename ]
my @Tests;
BEGIN {
   @Tests = (
      # cases that should pass
      [ 1, 3, 2, 1, "valid: 1<=2<=3" ],             # basic test, integers
      [ 0.1, 0.3, 0.2, 1, "valid: 0.1<=0.2<=0.3" ], # basic test, floats
      [ 1, 1, 1, 1, "valid: 1<=1<=1" ],             # test ok for equality
      [ -3, -1, -2, 1, "valid: -3<=-2<=-1" ],       # test negatives
      # cases that should fail
      [ 2, 3, 1, 0, "errcheck: too low, ints" ],          # too low, integers
      [ 1, 2, 3, 0, "errcheck: too high, ints" ],         # too high, integers
      [ 0.2, 0.3, 0.1, 0, "errcheck: too low, floats" ],  # too low, floats
      [ 0.1, 0.2, 0.3, 0, "errcheck: too high, floats" ], # too high, floats
      [ -2, -1, -3, 0, "errcheck: too low, negatives" ],  # test negatives
      [ -3, -2, -1, 0, "errcheck: too high, negatives" ], # test negatives
      [ "x", 3, 2, 0, "errcheck: lo not a number" ],      # lo is non-numeric
      [ 1, "x", 2, 0, "errcheck: lo not a number" ],      # hi is non-numeric
      [ 1, 3, "x", 0, "errcheck: lo not a number" ],      # val is non-numeric
   );
}

use Test::More tests => scalar @Tests;

foreach my $testCase (@Tests) {
   my $tname = pop @$testCase;
   my $exp = pop @$testCase;
   my $val = pop @$testCase;
   my $hi = pop @$testCase;
   my $lo = pop @$testCase;
   ok( inRange($lo, $hi, $val) == $exp, $tname);
}

Question 3: Perl scripting [12]

Below you are given a function that takes a file extension and directory as arguments. It then returns the number of files with that extension that are in the directory. For example, countLines("pl", "/bin"); would return the total number of .pl files in the bin directory.

# fileCount(extension, directory)
# -------------------------------
# returns a count of the number of files in the specified directory
#    that have the specified file extension
sub fileCount($$)
{
  my $extension = $_[0];
  my $directory = $_[1];
  # if the directory cannot be opened then terminate early
  opendir (my $handle, $directory)
     or die "Error: could not open directory $directory\n";
  # read the directory contents, checking each file against the pattern
  my $count = 0;
  my $pattern = "\\.$extension\$";
  while ( readdir $handle ) {
     my $nextfile = $_;
     if ($nextfile =~ $pattern) {
        $count++;
     }
  }
  closedir $handle;
  return $count;
}

Your task is to revise the function so that instead of taking a single file extension as the first argument, it instead takes a hash reference whose keys are the file extensions to be processed.

The revised function returns a hash reference where the values associated with the keys are the number of files with that extension.

For example, if the keys in the hash were 'pl' and 'cpp' then the value associated with key 'pl' would be a count of all the .pl files, and the value associated with key 'cpp' would be a count of all the .cpp files.

Sample solution
# fileCount(extensionsHash, directory)
# ------------------------------------
# given a hash whose keys are a set of file extensions,
#   and a directory to examine,
# count the number of files with each extension,
#   setting the associated hash value
# returns a count of the number of files in the specified directory
#    that have the specified file extension
sub fileCount(%$)
{
  my $extHash = $_[0];
  my %extensions = %$extHash;
  my $directory = $_[1];

  print "searching dir $directory\n";

  # if the directory cannot be opened then terminate early
  opendir (my $handle, $directory)
     or die "Error: could not open directory $directory\n";

  # read the directory contents, checking each file against each extension
  while ( readdir $handle ) {
     my $nextfile = $_;
     print "counting file $nextfile\n";
     # compare the current file to each of the extensions
     foreach my $extension (keys %extensions) {
        my $pattern = "\\.$extension\$";
        print "   Comparing $extension to pattern $pattern\n";
        # if the file has the desired extension,
        #    increment the count for that extension
        if ($nextfile =~ $pattern) {
           $extensions{$extension} = $extensions{$extension} + 1;
        }
     }
  }
  closedir $handle;
  return \%extensions;
}

Question 4: Bash scripting [12]

Write a bash script that takes a directory as a parameter and recursively searches all subdirectories to locate and display the locations of all git repositories anywhere within that directory tree. Notes:
(1) You do not have to search hidden directories (names beginning with .)
(2) Testing for the presence of .git will be adequate for deciding if a directory is a repository.
(3) remember that in bash you can use the following to process the files in a directory
for file in DIRECTORYNAME/*

Sample solution
# bash script to recursively search a directory for git repositories,
#    which will be recognized by the presence of .git directories

# recursive function to search for repositories
function repoSearch()
{
   # check for null directory names
   local dir=$1
   if [ -z $dir ] ; then
      return 0

   # see if it actually a directory
   elif [ -d $dir ] ; then
      # if it contains a git repository then display it
      local gitname="$dir/.git"
      if [[ -d $gitname ]] ; then
         echo "git repository: $dir"

      # otherwise see if it contains any subdirectories to search
      else
         for file in "$dir"/*; do
            if [[ -d $file ]] ; then
               repoSearch $file
            fi
         done
      fi
   fi
}

# start directory should be specified as a lone command line argument,
# check the number of arguments then call the recursive search function
if [ $# -ne 1 ] ; then
   echo "Expected use to search for git repositories is:"
   echo "   repoSearch dirname"
else
   repoSearch $1
fi

Question 5: Makefiles, compilation, and version control [12]

(1) Modify the makefile shown below to include a 'submit' target. Submit's associated actions are to perform a git add/commit/push sequence for the current directory.

The script must include a Message variable, which holds the message to be used in the commit. This allows the user to override the default commit message, which should be "Committing via make submit".

progX:	prog.o
	g++ -Wall prog.o -o progX

prog.o:	prog.h prog.cpp
	g++ -Wall -c prog.cpp
(2) Discuss any problems/risks you can think of associated with the use of your "make submit".

Sample solution
Message="Committing via make submit"

Sources = progX prog.cpp prog.h

progX:	prog.o
	g++ -Wall prog.o -o progX

prog.o: prog.h prog.cpp
	g++ -Wall -c prog.cpp

submit:
	for file in ${Sources} ; do \
		git add $$file  ; \
	done
	git add -u
	git commit -m ${Message}
	git push


Discussion: - assumes the makefile is in a git repository, fails otherwise - assumes there should be a push with every commit - assumes there are no unresolved conflicts - assumes there is an upstream to push to - assumes all three files are up to date - overuse of the default message means lots of commits with stock messages, which detracts from ability to search commit history ** if "git add ." was used instead of Sources + for loop then an important additional discussion point would be that it assumes EVERYTHING in the current directory should be added to repo tracking

Question 6: Profiling: gprof [12]

Study the gprof output shown below, and explain what the results tell you about the memory and cpu utilization in the program, particularly with regard to the behaviour of the c function (since it takes up most of the cpu utilization) and its relationships with the functions that call it.

gprof -b results
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
100.76      8.83     8.83       26     0.34     0.34  c(long)
  0.00      8.83     0.00        2     0.00     0.00  p(long)
  0.00      8.83     0.00        1     0.00     6.79  f(long)
  0.00      8.83     0.00        1     0.00     1.70  g(long)
  0.00      8.83     0.00        1     0.00     0.34  h(long)

Call graph


granularity: each sample hit covers 2 byte(s) for 0.11% of 8.83 seconds

index % time    self  children    called     name
                                                 
[1]    100.0    0.00    8.83                 main [1]
                0.00    6.79       1/1           f(long) [3]
                0.00    1.70       1/1           g(long) [4]
                0.00    0.34       1/1           h(long) [5]
-----------------------------------------------
                0.34    0.00       1/26          h(long) [5]
                1.70    0.00       5/26          g(long) [4]
                6.79    0.00      20/26          f(long) [3]
[2]    100.0    8.83    0.00      26         c(long) [2]
-----------------------------------------------
                0.00    6.79       1/1           main [1]
[3]     76.9    0.00    6.79       1         f(long) [3]
                6.79    0.00      20/26          c(long) [2]
-----------------------------------------------
                0.00    1.70       1/1           main [1]
[4]     19.2    0.00    1.70       1         g(long) [4]
                1.70    0.00       5/26          c(long) [2]
                0.00    0.00       1/2           p(long) [12]
-----------------------------------------------
                0.00    0.34       1/1           main [1]
[5]      3.8    0.00    0.34       1         h(long) [5]
                0.34    0.00       1/26          c(long) [2]
                0.00    0.00       1/2           p(long) [12]
-----------------------------------------------
                0.00    0.00       1/2           g(long) [4]
                0.00    0.00       1/2           h(long) [5]
[12]     0.0    0.00    0.00       2         p(long) [12]
-----------------------------------------------

Index by function name

   [2] c(long)                 [4] g(long)                [12] p(long)
   [3] f(long)                 [5] h(long)

Sample solution

Memory use:
   gprof does not reveal information about cpu use, for that
       we would need to use a different profiler (e.g. valgrind)

CPU use:
   as noted in the question, the overwhelming majority of cpu use
      takes place in function c

   of the 26 calls to c, 20 come from function f, 5 from function g,
      and one from function h

   the cpu use in c for calls coming from f, g, and h are shown below,
      then the number of calls, then from that we calculate the cpu
      use per call
          caller  calls  cpu use    calculated use per call
            f       20    6.79           0.34
            g        5    1.70           0.34
            h        1    0.34           0.34
   from this we can include that the per-call running time of c is
      independent of which function is calling it

Question 7: Specifications, testing, and life cycles [12]

In lectures, the approaches we discussed mostly followed the idea that test cases should be based on the specifications - seeing if the product actually does what we said it would.

Test-based development, on the other hand, is a process in which the specifications are very very vague initially, and over time the users give us more and more test cases they want the program to pass. Effectively we keep re-design the product to handle the old cases plus the new ones, and they keep giving us more detailed and special test cases.

For instance, if they want us to build a function that checks if phone numbers are valid, they might start by giving us a local 250 phone number. Then, when we give them a version that succeeds on that, they give us a long distance number for Canada/US (1-xxx-xxx-xxxx). When we give them a version that handles that, they give us a long distance number for Australia. When it handles that, they give us a local number with an extension. Etc. etc.

Give specific examples of programs and scenarios for which test-based development would be particularly useful and explain why, then give examples of programs and scenarios for which test-based development would be a poor choice and explain why.
Sample solution

Suitable for test based development

 - small, flexible systems with a co-operative user base
   happy with a gradual/incremental development approach, e.g.
     - a small home business website
     - a text editor (the increments may be the new desired
       editing features)

 - systems where the users can recognize and give examples
   of what they need, but have a hard time (or lack of time
   for) documenting/articulating their needs, e.g.
     - a small company transitioning from a paper-based business model

 - systems with straight forward core functionality, but a
   great many special cases to be dealt with, e.g.
     - a system for calculating sale prices/discounts
       where the company has a great many negotiated deals
     - an advising system for determining what courses a
       student has left to complete ;-)

NOT suitable for test based development

 - complex systems with heavily inter-related data elements
   of process components, where a poor early design decision
   will likely have massive effects later in the project, e.g.
     - complex multiplayer games
     - operating systems

 - systems that have low tolerance for risk (safety-critical,
   real-time, high security, etc), e.g.
     - financial systems
     - flight control systems

Question 8: Regular expressions [12]

Pick one of the two scripting languages we focused on this term (Perl or Bash), identify which one you chose, and for each of the patterns described below provide a suitable regular expression to describe each pattern:

  1. A non-empty string that does not contain a newline.

  2. An integer value with no leading zeroes (e.g. 23, +23, -23 are valid, while 023, -023, +023 are not).

  3. A string that begins and ends with a double-quote, and may contain any character except a double-quote (e.g. "" would be valid, "abc def" would be valid, "x"y" would not be valid).
Sample solution
# perl versions

# Part I
#    any non-empty string that does not contain a newline
my $noNewline = '^[^\n]+$';

# Part II
#    an optional +/-
#    followed by one or more 1-9's
#    followed by zero or more 0-9's
my $noLeadZero = '^[+-]?[1-9][0-9]*$';

# Part III
#      ^\" so it begins with a "
#      \"$ so it ends with a "\
#      [^\"]* so inside can be anything except a "
my $quotedStr = '^\"[^\"]*\"$';


# bash versions # Part I # since bash doesn't recognize \n per se, # we can instead represent the newline with the following Newline=$'\n'; # the string matches (contains) Newline then it's what we don't want echo "Testing against no-newlines by failing the match-a-newline test" if [[ $userStr =~ $Newline ]] ; then echo "No match" echo "" else echo "Matched!" echo "" fi # Part II: # an optional +/- # followed by one or more 1-9's # followed by zero or more 0-9's noLeadZero='^[+-]?[1-9]+[0-9]*$' # Part III: # ^\" so it begins with a " # \"$ so it ends with a "\ # [^\"]* so inside can be anything except a " quotedStr='^\"[^\"]*\"$'

Question 9: Design choices [12]

Explain the meaning of the claim below, then make arguments both for and against the claim.

Sample solution

Argument for:

 - Cohesion measures how tightly integrated the logic is within
   individual components.

   If we introduce controllability (which is the ability to set
   or control the value of internal program data values from
   "outside" the component) or observability (which is the ability
   to observe internal component data values from outside) then
   we are introducing elements that are less tightly integrated
   with the rest of the component. This links its workings with
   the outside world and thus lowers its cohesion.

 - Coupling measures the amount of interaction between components.

   To increase controllability or observability from outside a
   component, we are necessarily increasing its coupling with
   the external system, since we are adding a new data channel
   into or out of the component.

Argument against:

 - Sweeping generalizations are rarely entirely correct (note the
   sweeping generalization there ;-).

   It may be the case that well integrated mechanisms to get/set data
   values already exist within a program, but did not need to be fully
   exploited to achieve the basic program functionality.  Now, for the
   purposes of testing, we could introduce greater controllability and/or
   observability through adding new components, which are themselves nicely
   cohesive and loosely coupled, which make use of those get/set mechanisms
   without significantly impacting either the cohesiveness or coupling of
   the existing components.