CSCI 265: Fall 2016 Final Exam F16N01-02 Sample Solutions

CSCI 265: Fall Final Exam Sample Solutions
Sections F16N01-F16N02 (Dr. Wessels)

CLEARLY PRINT YOUR NAME HERE:
You have 3 hours to complete the exam (9:00-12:00)
There are 9 questions on the exam, worth 12 marks each.
Your best 8 answers will be used, for a total of 96 marks.
Answer directly on the exam paper, using the backs of pages if extra space is required.
The exam is open book and open notes, but no calculators, phones, or other electronic aids of any form may be used during the exam.
Be sure to check your exam to make sure all 9 questions.

Question Mark

1. Inspections and version control
[12 marks]

2. Perl Test
[12 marks]

3. Perl scripting
[12 marks]

4. Bash scripting
[12 marks]

5. Makefiles, compilation, and version control
[12 marks]

6. Profiling: gprof
[12 marks]

7. Specifications, testing, and life cycles
[12 marks]

8. Regular expressions
[12 marks]

9. Design choices
[12 marks]

Exam Total
[96 marks]

Question 1: Inspections and version control [12]

One of the code inspection issues we discussed dealt with keeping track of discussion points during asynchronous code inspections. If we allow members of the inspection team to discuss the item under inspection through email, messaging, voice mail, etc, then ensuring all members of the team have the same information about the discussions can be a challenge.

One solution is to include all inspection discussion content in our version control process. Your task is to detail a process a team could follow that permits asynchronous inspections, but keeps the "discussion" content under version control. Be sure to discuss the strengths and weaknesses of your proposed process.
Sample solution Process: When posting general observations/discussion points, team members are expected to post them to the shared document INSERT DOCUMENT LOCATION/ID HERE All team members are expected to check that document before contacting any other inspection team members with questions. If members of the inspection team discuss the material with each other (by phone, email, chat, etc) the person who initiated the discussion is responsible for posting the salient points (especially any action items) to the shared document before the end of that working day. Each morning, the lead of the inspection team will review the previous day's changes to the shared document, and ensure the latest copy is placed in the version control repository with an appropriate commit message. Strengths: - ensures the communication is under version control - assigns direct responsibility for ensuring the content is updated on the shared document and for checking said document before initiating new discussions - ensures the document is reviewed daily by the lead Weaknesses: - relies on team members to remember to update the shared document, and to remember (take notes) for all the key points from any conversations with other inpection team members - relies on team members to check the shared document before initiating new discussions - requires the lead to spend the beginning of each day on inspection administration

Question 2: The Perl Test module[12]

Using Perl's Test module (either Simple or More) write a perl script, complete with test cases, to test the perl function whose prototype is shown below.

# inRange(lo, hi, val)
# --------------------
# inRange returns true (1) if lo, hi, and val are all numeric values
#    where lo <= val <= hi
# returns false (0) otherwise
sub inRange($$$);

Assume the function is provided in module mathstuff.pm.

Sample solution
#! /usr/bin/perl

use strict;
use warnings;
use mathstuff;

# the collection of test cases, each of form
#    [ lo, hi, val, expected, casename ]
my @Tests;
BEGIN {
   @Tests = (
      # cases that should pass
      [ 1, 3, 2, 1, "valid: 1<=2<=3" ],             # basic test, integers
      [ 0.1, 0.3, 0.2, 1, "valid: 0.1<=0.2<=0.3" ], # basic test, floats
      [ 1, 1, 1, 1, "valid: 1<=1<=1" ],             # test ok for equality
      [ -3, -1, -2, 1, "valid: -3<=-2<=-1" ],       # test negatives
      # cases that should fail
      [ 2, 3, 1, 0, "errcheck: too low, ints" ],          # too low, integers
      [ 1, 2, 3, 0, "errcheck: too high, ints" ],         # too high, integers
      [ 0.2, 0.3, 0.1, 0, "errcheck: too low, floats" ],  # too low, floats
      [ 0.1, 0.2, 0.3, 0, "errcheck: too high, floats" ], # too high, floats
      [ -2, -1, -3, 0, "errcheck: too low, negatives" ],  # test negatives
      [ -3, -2, -1, 0, "errcheck: too high, negatives" ], # test negatives
      [ "x", 3, 2, 0, "errcheck: lo not a number" ],      # lo is non-numeric
      [ 1, "x", 2, 0, "errcheck: lo not a number" ],      # hi is non-numeric
      [ 1, 3, "x", 0, "errcheck: lo not a number" ],      # val is non-numeric
   );
}

use Test::More tests => scalar @Tests;

foreach my $testCase (@Tests) {
   my $tname = pop @$testCase;
   my $exp = pop @$testCase;
   my $val = pop @$testCase;
   my $hi = pop @$testCase;
   my $lo = pop @$testCase;
   ok( inRange($lo, $hi, $val) == $exp, $tname);
}

Question 3: Perl scripting [12]

Below you are given a function that takes a file extension and directory as arguments. It then returns the number of files with that extension that are in the directory. For example, countLines("pl", "/bin"); would return the total number of .pl files in the bin directory.

# fileCount(extension, directory)
# -------------------------------
# returns a count of the number of files in the specified directory
#    that have the specified file extension
sub fileCount($$)
{
  my $extension = $_[0];
  my $directory = $_[1];
  # if the directory cannot be opened then terminate early
  opendir (my $handle, $directory)
     or die "Error: could not open directory $directory\n";
  # read the directory contents, checking each file against the pattern
  my $count = 0;
  my $pattern = "\\.$extension\$";
  while ( readdir $handle ) {
     my $nextfile = $_;
     if ($nextfile =~ $pattern) {
        $count++;
     }
  }
  closedir $handle;
  return $count;
}

Your task is to revise the function so that instead of taking a single file extension as the first argument, it instead takes a hash reference whose keys are the file extensions to be processed.

The revised function returns a hash reference where the values associated with the keys are the number of files with that extension.

For example, if the keys in the hash were 'pl' and 'cpp' then the value associated with key 'pl' would be a count of all the .pl files, and the value associated with key 'cpp' would be a count of all the .cpp files.

Sample solution
# fileCount(extensionsHash, directory)
# ------------------------------------
# given a hash whose keys are a set of file extensions,
#   and a directory to examine,
# count the number of files with each extension,
#   setting the associated hash value
# returns a count of the number of files in the specified directory
#    that have the specified file extension
sub fileCount(%$)
{
  my $extHash = $_[0];
  my %extensions = %$extHash;
  my $directory = $_[1];

  print "searching dir $directory\n";

  # if the directory cannot be opened then terminate early
  opendir (my $handle, $directory)
     or die "Error: could not open directory $directory\n";

  # read the directory contents, checking each file against each extension
  while ( readdir $handle ) {
     my $nextfile = $_;
     print "counting file $nextfile\n";
     # compare the current file to each of the extensions
     foreach my $extension (keys %extensions) {
        my $pattern = "\\.$extension\$";
        print "   Comparing $extension to pattern $pattern\n";
        # if the file has the desired extension,
        #    increment the count for that extension
        if ($nextfile =~ $pattern) {
           $extensions{$extension} = $extensions{$extension} + 1;
        }
     }
  }
  closedir $handle;
  return \%extensions;
}

Question 4: Bash scripting [12]

Write a bash script that takes a directory as a parameter and recursively searches all subdirectories to locate and display the locations of all git repositories anywhere within that directory tree. Notes:
(1) You do not have to search hidden directories (names beginning with .)
(2) Testing for the presence of .git will be adequate for deciding if a directory is a repository.
(3) remember that in bash you can use the following to process the files in a directory
for file in DIRECTORYNAME/*

Sample solution # bash script to recursively search a directory for git repositories, # which will be recognized by the presence of .git directories # recursive function to search for repositories function repoSearch() { # check for null directory names local dir=$1 if [ -z $dir ] ; then return 0 # see if it actually a directory elif [ -d $dir ] ; then # if it contains a git repository then display it local gitname="$dir/.git" if [[ -d $gitname ]] ; then echo "git repository: $dir" # otherwise see if it contains any subdirectories to search else for file in "$dir"/*; do if [[ -d $file ]] ; then repoSearch $file fi done fi fi } # start directory should be specified as a lone command line argument, # check the number of arguments then call the recursive search function if [ $# -ne 1 ] ; then echo "Expected use to search for git repositories is:" echo " repoSearch dirname" else repoSearch $1 fi

Question 5: Makefiles, compilation, and version control [12]

(1) Modify the makefile shown below to include a 'submit' target. Submit's associated actions are to perform a git add/commit/push sequence for the current directory.

The script must include a Message variable, which holds the message to be used in the commit. This allows the user to override the default commit message, which should be "Committing via make submit".

progX:	prog.o
	g++ -Wall prog.o -o progX

prog.o:	prog.h prog.cpp
	g++ -Wall -c prog.cpp

(2) Discuss any problems/risks you can think of associated with the use of your "make submit".

Sample solution Message="Committing via make submit" Sources = progX prog.cpp prog.h progX: prog.o g++ -Wall prog.o -o progX prog.o: prog.h prog.cpp g++ -Wall -c prog.cpp submit: for file in ${Sources} ; do \ git add $$file ; \ done git add -u git commit -m ${Message} git push
Discussion: - assumes the makefile is in a git repository, fails otherwise - assumes there should be a push with every commit - assumes there are no unresolved conflicts - assumes there is an upstream to push to - assumes all three files are up to date - overuse of the default message means lots of commits with stock messages, which detracts from ability to search commit history ** if "git add ." was used instead of Sources + for loop then an important additional discussion point would be that it assumes EVERYTHING in the current directory should be added to repo tracking

Question 6: Profiling: gprof [12]

Study the gprof output shown below, and explain what the results tell you about the memory and cpu utilization in the program, particularly with regard to the behaviour of the c function (since it takes up most of the cpu utilization) and its relationships with the functions that call it.

gprof -b results
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 100.76 8.83 8.83 26 0.34 0.34 c(long) 0.00 8.83 0.00 2 0.00 0.00 p(long) 0.00 8.83 0.00 1 0.00 6.79 f(long) 0.00 8.83 0.00 1 0.00 1.70 g(long) 0.00 8.83 0.00 1 0.00 0.34 h(long) Call graph granularity: each sample hit covers 2 byte(s) for 0.11% of 8.83 seconds index % time self children called name [1] 100.0 0.00 8.83 main [1] 0.00 6.79 1/1 f(long) [3] 0.00 1.70 1/1 g(long) [4] 0.00 0.34 1/1 h(long) [5] ----------------------------------------------- 0.34 0.00 1/26 h(long) [5] 1.70 0.00 5/26 g(long) [4] 6.79 0.00 20/26 f(long) [3] [2] 100.0 8.83 0.00 26 c(long) [2] ----------------------------------------------- 0.00 6.79 1/1 main [1] [3] 76.9 0.00 6.79 1 f(long) [3] 6.79 0.00 20/26 c(long) [2] ----------------------------------------------- 0.00 1.70 1/1 main [1] [4] 19.2 0.00 1.70 1 g(long) [4] 1.70 0.00 5/26 c(long) [2] 0.00 0.00 1/2 p(long) [12] ----------------------------------------------- 0.00 0.34 1/1 main [1] [5] 3.8 0.00 0.34 1 h(long) [5] 0.34 0.00 1/26 c(long) [2] 0.00 0.00 1/2 p(long) [12] ----------------------------------------------- 0.00 0.00 1/2 g(long) [4] 0.00 0.00 1/2 h(long) [5] [12] 0.0 0.00 0.00 2 p(long) [12] ----------------------------------------------- Index by function name [2] c(long) [4] g(long) [12] p(long) [3] f(long) [5] h(long)

Sample solution Memory use: gprof does not reveal information about cpu use, for that we would need to use a different profiler (e.g. valgrind) CPU use: as noted in the question, the overwhelming majority of cpu use takes place in function c of the 26 calls to c, 20 come from function f, 5 from function g, and one from function h the cpu use in c for calls coming from f, g, and h are shown below, then the number of calls, then from that we calculate the cpu use per call caller calls cpu use calculated use per call f 20 6.79 0.34 g 5 1.70 0.34 h 1 0.34 0.34 from this we can include that the per-call running time of c is independent of which function is calling it

Question 7: Specifications, testing, and life cycles [12]

In lectures, the approaches we discussed mostly followed the idea that test cases should be based on the specifications - seeing if the product actually does what we said it would.

Test-based development, on the other hand, is a process in which the specifications are very very vague initially, and over time the users give us more and more test cases they want the program to pass. Effectively we keep re-design the product to handle the old cases plus the new ones, and they keep giving us more detailed and special test cases.

For instance, if they want us to build a function that checks if phone numbers are valid, they might start by giving us a local 250 phone number. Then, when we give them a version that succeeds on that, they give us a long distance number for Canada/US (1-xxx-xxx-xxxx). When we give them a version that handles that, they give us a long distance number for Australia. When it handles that, they give us a local number with an extension. Etc. etc.

Give specific examples of programs and scenarios for which test-based development would be particularly useful and explain why, then give examples of programs and scenarios for which test-based development would be a poor choice and explain why.
Sample solution Suitable for test based development - small, flexible systems with a co-operative user base happy with a gradual/incremental development approach, e.g. - a small home business website - a text editor (the increments may be the new desired editing features) - systems where the users can recognize and give examples of what they need, but have a hard time (or lack of time for) documenting/articulating their needs, e.g. - a small company transitioning from a paper-based business model - systems with straight forward core functionality, but a great many special cases to be dealt with, e.g. - a system for calculating sale prices/discounts where the company has a great many negotiated deals - an advising system for determining what courses a student has left to complete ;-) NOT suitable for test based development - complex systems with heavily inter-related data elements of process components, where a poor early design decision will likely have massive effects later in the project, e.g. - complex multiplayer games - operating systems - systems that have low tolerance for risk (safety-critical, real-time, high security, etc), e.g. - financial systems - flight control systems

Question 8: Regular expressions [12]

Pick one of the two scripting languages we focused on this term (Perl or Bash), identify which one you chose, and for each of the patterns described below provide a suitable regular expression to describe each pattern:

A non-empty string that does not contain a newline.
An integer value with no leading zeroes (e.g. 23, +23, -23 are valid, while 023, -023, +023 are not).
A string that begins and ends with a double-quote, and may contain any character except a double-quote (e.g. "" would be valid, "abc def" would be valid, "x"y" would not be valid).

Sample solution
# perl versions

# Part I
#    any non-empty string that does not contain a newline
my $noNewline = '^[^\n]+$';

# Part II
#    an optional +/-
#    followed by one or more 1-9's
#    followed by zero or more 0-9's
my $noLeadZero = '^[+-]?[1-9][0-9]*$';

# Part III
#      ^\" so it begins with a "
#      \"$ so it ends with a "\
#      [^\"]* so inside can be anything except a "
my $quotedStr = '^\"[^\"]*\"$';


# bash versions

# Part I
# since bash doesn't recognize \n per se,
#    we can instead represent the newline with the following
Newline=$'\n';
# the string matches (contains) Newline then it's what we don't want
echo "Testing against no-newlines by failing the match-a-newline test"
if [[ $userStr =~ $Newline ]] ; then
   echo "No match"
   echo ""
else
   echo "Matched!"
   echo ""
fi

# Part II:
#    an optional +/-
#    followed by one or more 1-9's
#    followed by zero or more 0-9's
noLeadZero='^[+-]?[1-9]+[0-9]*$'

# Part III:
#   ^\" so it begins with a "
#   \"$ so it ends with a "\
#   [^\"]* so inside can be anything except a "
quotedStr='^\"[^\"]*\"$'

Question 9: Design choices [12]

Explain the meaning of the claim below, then make arguments both for and against the claim.

In complex software systems, improving controllability and observability generaly comes at the cost of lower cohesion and/or higher coupling.

Sample solution

Argument for:

 - Cohesion measures how tightly integrated the logic is within
   individual components.

   If we introduce controllability (which is the ability to set
   or control the value of internal program data values from
   "outside" the component) or observability (which is the ability
   to observe internal component data values from outside) then
   we are introducing elements that are less tightly integrated
   with the rest of the component. This links its workings with
   the outside world and thus lowers its cohesion.

 - Coupling measures the amount of interaction between components.

   To increase controllability or observability from outside a
   component, we are necessarily increasing its coupling with
   the external system, since we are adding a new data channel
   into or out of the component.

Argument against:

 - Sweeping generalizations are rarely entirely correct (note the
   sweeping generalization there ;-).

   It may be the case that well integrated mechanisms to get/set data
   values already exist within a program, but did not need to be fully
   exploited to achieve the basic program functionality.  Now, for the
   purposes of testing, we could introduce greater controllability and/or
   observability through adding new components, which are themselves nicely
   cohesive and loosely coupled, which make use of those get/set mechanisms
   without significantly impacting either the cohesiveness or coupling of
   the existing components.

Question	Mark
1. Inspections and version control [12 marks]
2. Perl Test [12 marks]
3. Perl scripting [12 marks]
4. Bash scripting [12 marks]
5. Makefiles, compilation, and version control [12 marks]
6. Profiling: gprof [12 marks]
7. Specifications, testing, and life cycles [12 marks]
8. Regular expressions [12 marks]
9. Design choices [12 marks]
Exam Total [96 marks]

CSCI 265: Fall Final Exam Sample Solutions Sections F16N01-F16N02 (Dr. Wessels)

CSCI 265: Fall Final Exam Sample Solutions
Sections F16N01-F16N02 (Dr. Wessels)