CSCI 161 Lab 3: sections S26N01/2
See the main lab page for submission deadlines and
late penalties.
It is assumed you are keeping up to date with the lectures and labs.
Some additional git information is available through the
git quick notes,
git project/lab submission, and
lecture resources pages.
This lab builds on the ideas from lab2, but now stores the data in a linked list,
rather than an array, and uses mergesort on the linked list (instead of
quicksort on an array).
The marking for this lab will be stricter in terms of the code standards,
so be sure you have reviewed them carefully.
Marks will be deducted for failing to follow standards.
lab3 product requirements/specification/design
As mentioned at the top of this page, the desired functionality for lab3 is the
same as lab2, but with two key differences:
- Rather than using arrays to store the data we will now use a linked list,
adding a new element to the back of the existing list each time you read a new valid
pair from the file.
- The sorting algorithm you are using must be mergesort this time (not quicksort),
and it must operate on your linked list (not an array).
Hand-built linked list development is a core part
of this lab. As such, you must use the linked-list of structs approach for this lab, i.e.
not C++ library lists or queues or similar data structures to store your collection of rankings.
Similarly, the implementation, testing, and debugging of a recursive mergesort algorithm
(with separate merge function) is another core part, and you are required
to implement each of them yourself - i.e. do not use the various C++ algorithm/sort libraries and functions.
Design ideas
As with labs 1 and 2, you are encouraged to come up with your own design, but are free to use
the sample design below.
The design shows a similar set of functions as those in lab2, but using two new structs,
one for individual items and one for a list of rankings as a whole.
Underneath the sample design code you'll find discussion of the more challenging
aspects of switching to work with a linked list and switching to work with mergesort.
// sample main from lab3.cpp
int main(int argc, char* argv[])
{
// quit if the user didn't pass valid arguments
if (!checkUsage(argc, argv)) {
cerr << "Program terminated early, unable to proceed\n";
return 0;
}
string filename = argv[1]; // filename is more intuitive later
welcome(); // program intro for user
// load the rankings from the named file
ItemList rankings;
initList(rankings);
if (!readRankings(filename, rankings)) {
cerr << "Error: could not load file, ending program" << endl;
deallocate(rankings);
return 0;
}
// run head-to-head matchups, skipping unsuccessful ones
int matches = getNumMatches();
for (int m = 1; m <= matches; m++) {
if (!runMatchup(rankings)) {
cerr << "Error: match " << m << " unsuccessful, rankings not updated\n";
}
}
// sort the rankings (lowest to highest)
msortRanks(rankings);
// write the updated rankings back to the file
if (!writeRankings(filename, rankings)) {
cerr << "Unable to updated rankings file " << filename << endl;
}
deallocate(rankings);
return 0;
}
//sample ranking.h
#pragma once
#include <iostream>
#include <string>
using std::cout;
using std::cin;
using std::cerr;
using std::endl;
using std::string;
const bool DEBUGMODE = false; // used to turn debugging output on/off
// representation of a single item within a linked list
struct ItemRank {
int rank; // positive integer rank
string description; // text description of item
ItemRank* next; // pointer to the item after/behind this item (closer to the back)
ItemRank* prev; // pointer to the item ahead/before this item (closer to the front)
};
// representation of a list of items
struct ItemList {
ItemRank* front; // the first item in line
ItemRank* back; // the last item in line
int size; // number of items in the list
};
// delete all the allocated items in a list then re-initialize it
void deallocate(ItemList &rankings);
// load item rankings from the named file, store in rankings,
// return true if successful, false otherwise
bool readRankings(string filename, ItemList &rankings);
// initialize a list as empty
void initList(ItemList &rankings);
// insert the item at the back of the rankings list
// return true if successful, false otherwise
bool insertBack(ItemList &rankings, ItemRank* item);
// remove the front item from the rankings list
// return the pointer to it (null if list was empty)
ItemRank* removeFront(ItemList &rankings);
// return a pointer to the nth item in the list, null if N not valid
ItemRank* getNthItem(ItemList rankings, int N);
// return true iff descript is a valid item description
// (i.e. contains at least one non-whitespace character)
bool validDescription(string descript);
// run a head-to-head matchup
// - pick distinct pair of items
// - present user with choice, get valid response
// - update rankings based on response
// return true iff the matchup is successful
bool runMatchup(ItemList &rankings);
// write rankings to a file
// return true iff successful
bool writeRankings(string filename, ItemList rankings);
// calculate and return new rank for X based on current ranks for X and Y
// and score from X's perspective: 1=X win, 0.5=draw, 0=Y win
// return unchanged RankX in the event of an invalid parameter
int calcNewRank(int RankX, int RankY, float score);
// present the user with a choice between items X and Y
// return 1 if they choose X
// 0 if they choose Y
// 0.5 if it's a draw
// (repeat until they make a valid choice)
float getChoice(ItemRank* X, ItemRank* Y);
// remove rankings from the two sorted source lists, merging them into the destination list
bool merge(ItemList &source1, ItemList &source2, ItemList &destination);
// sort the rankings from lowest rank to highest
void msortRanks(ItemList &rankings);
|
How our read routine changes
In our lab2 version of reading from a file we could simply put the next item into the
next array position. Now we need to create a new item (like the allocate function
from the lecture on Feb. 25) and then insert it into the list of items (like the
insertAtBack function from the Feb25 lecture).
How picking two random items for a match changes
In our lab2 version for an array of N elements we could simply pick two different
random numbers in the range 0..N-1 and grab those two items for our matchup.
In lab3, however, the list doesn't inherently have a size associated with it
and we have no way to jump directly to the i'th item in the list.
One possible solution is to add a size field to the list struct: initialize it to
0 and increment it every time you do an insert (so the size always matches the number
of elements currently in the list).
Then we can still pick two random numbers in the range 0..N-1 (or 1..N), but we need
to scan the list from the front to find the i'th item. This is essentially just a for loop
that counts how many items to skip across, then return a pointer to the item you
land on. If you look at the example design above, you'll see a getNthItem function
that does exactly that.
Mergesort on linked lists
If you're looking at the slides/sample code online, you'll see the mergesort and merge
in that example ( mergesort.cpp ) is based on
sorting an array, whereas here in lab3 you'll be using mergesort on a linked list
(like the msortRanks and merge functions in the design above).
This does involve some extra juggling to translate an array-based solution into
a linked-list style. The core steps are still the same for the recursive mergesort
function itself (msortRanks above):
- split the original linked list into two roughly-equal halves
- call mergesort (msortRanks) on each half
- merge the two sorted half-lists back into one sorted whole
The big changes are in steps 1 and 3.
- In step one, to split the list in half,
we can't simply point at the middle and break the list in half since we only have pointers
to the two ends (front and back). What we can do, is create two new empty lists
and go through the 'big' list one item at a time, removing them from the big list and
alternating between putting them in little list one and little list two.
- In step three, to merge the two lists together into one big one,
we just keep looking at the current front item in each of the two little lists,
picking the smaller of them and moving it from the little list (e.g. removeFront)
and putting it into the big list (e.g. insertBack).
Second week discussion: planning test cases
Just to reiterate points discussed earlier in the term, it's crucial for developers
to create good habits in thoroughly testing their own code: both the program as a whole
and the individual functions.
A substantial part of this comes down to recognizing what constitutes good test cases:
how do we ensure that the code we've written actually does behave as intended, both on
'good/normal' user data and also in its handling of user mistakes.
Going through the specifications/requirements for a product, nearly every sentence should
trigger testing ideas in the minds of the developer, covering all the different ways in which
valid/invalid data may be provided by the user.
It is important not to make the assumption that our code correctly handles
anything, valid or invalid, without actually verifying that with a corresponding test case.
Considering our current program as a case study, there are four key areas where the user
provides data:
- the filenames provided as command line arguments
- the contents of those files
- the user's selection of how many head-to-head matchups to run
- the user's responses to the individual head-to-head matchups
In each of these areas, we should come up with a list of typical valid data options with
an extra focus on edge cases (valid but barely), and typical erroneous data with (again)
a focus on edge cases (invalid but barely).
For each of the test scenarios we come up with, we should come up with the precise
data to be used for the test case and the precise results that we should then see
from the program (all output messages and all changes to the file content).
In our case study, the kinds of test cases we would need (at least) would include:
- for the filenames provided as command line arguments:
- a simple valid filename, e.g. "test17", where the file has valid content
- a filename and path, e.g. "tests/whatever"
- no filename (error message expected)
- the name of a file that does not exist (error message expected)
- multiple filenames (error message expected)
- for the contents of those files:
- a file with just two simple valid entries
- a file with multiple simple valid entries
- a file with blank lines and whitespace lines in between valid entries, should skip these
- a file where the descriptions are multiple words long with a variety of embedded space
- a file with minimum valid ranks (would also do some with maximum valid ranks if we had defined an upper limit)
- a file with descriptions that are just punctuation (should be ok)
- a file with descriptions that are just single characters (should be ok)
- a file entirely filled with whitespace, error message expected
- test with a completely empty file (no content at all), error message expected
- a file with lines missing the ranks, error message expected, lines skipped
- a file with lines with negative ranks, error message expected, lines skipped
- a file with lines with non-integer ranks, error message expected, lines skipped
- a file with lines missing the descriptions, error message expected, lines skipped
- for the user's selection of how many head-to-head matchups to run:
- just one matchup
- two matchups
- some larger number of matchups
- an attempt to use 0 matchups (should error and prompt again)
- an attempt to use negative matchups (should error and prompt again)
- an attempt to use nonnumeric matchups (should error and prompt again)
- an attempt with each of the above errors multiple times (to check the repeat until valid)
- for the user's responses to the individual head-to-head matchups:
- a response of F
- a response of f
- a response of S
- a response of s
- a response of D
- a response of d
- a response with any other character (should error and prompt again)
- multiple consecutive invalid responses (to check the repeat until valid)
Before submitting a solution, look at the list above and ask yourself if you have
tested your final submission of the lab against each of those scenarios.
Second week discussion: debuggers/gdb
Another key developer skill is that of debugging: being able to track down the root cause
of errors in our code. We may learn of those errors through testing (as above) or through
feedback from users or other developers, but we must then have a plan of attack for
detecting the source of the issue.
The first crucial step is to come up with one or more hypotheses as to the possible sources,
and then develop a plan for either confirming or rejecting each hypothesis. Such plans
might include refining our test cases to narrow down specific possibilities, and might also
include examining the code's behaviour as it executes during a test case. This latter
approach is made significantly easier through the use of debuggers: tools which allow
us to pause the program as it runs and examine (or even modify) the values of variables
and memory to get greater insights as to its actual run-time behaviour (as opposed to
what we thought it *should* be doing).
We'll look at the use of one basic debugger, gdb, but most available debuggers
and IDEs (integrated development environments, such as VSCode) provide a very similar
suite of capabilities.
I strongly recommend you try these steps on your lab3x, and get used to routinely using debuggers
as part of your programming toolbox.
- Preparing for gdb use
- gdb requires that we compile with the -g option (already done in the makefiles
if you're using gdb on your lab code)
- Starting gdb and setting up break points
- Running/controlling the program through gdb
- To start the program we use run and whatever command line arguments
we would ordinarily pass to the program we're testing, e.g. to run lab3x on file tests/posted
we would enter the following command at the (gdb) prompt
run tests/posted
- The program will run normally (well, maybe a bit slower than normal), including displaying
output and waiting for user input at the usual spots.
- If/when the program hits one of the breakpoints we set earlier then it will pause and
give us another (gdb) prompt and wait for us to tell it what to do next.
The most common options are as follows, but there are many many others:
- tell it to run just one more instruction then pause again using
step
- tell it to run just one more instruction then pause again, but treat function
calls as a single action (rather than going 'inside' them to step there)
next
- get it to display the value of a variable or constant (that is accessible in
the current function), e.g. for variable x
print x
- set a new value for a(n accessible) variable, e.g. set x to 200
set variable x = 200
- show the sequence of function calls that led to the current spot in the program
(especially useful right after a crash)
backtrace
- show the lines of code immediately before/after the current location in the code
list
- tell it to resume running normally (until the next breakpoint) using
continue
- tell it to quit/close (it may ask for confirmation to kill the running program)
quit
Note that most of these commands can be abbreviated to just a single letter, e.g.
c for continue, n for next, etc.
- Recommended experimentation
Whether in gdb or vscode, practice using debuggers to examine your code's runtime behaviour.
I recommend trying it out today with your lab3x: set some breakpoints and examine the
variable values and code sequence with p, n, s, c, etc. It's far handier than needing
to embed cout statements to print variable values when you're trying to figure out what/where
the code is going wrong as it executes!