Frequently when examining computational problems we wish to compare different
problem solutions to each other in terms of efficiency,
as well as considering theoretical limits on attainable efficiency.
To do so, we require a few basic tools and techniques, as outlined below.
Time and space efficiency of algorithms
- An algorithm is a process or series of steps by which we determine a
solution
to a particular instance of a problem.
For our purposes `algorithms'
are often just pseudo-code.
- Example: the problem is to sort an array of N integers:
An algorithm might be
SlowSort:
for i = 1 to N do
for j = 0 to N-2 do
if arr[j] < arr[j+1]
then swap arr[j] with arr[j+1]
endfor
endfor
While an instance of the problem might be:
N = 9, and arr[] contains 7 3 -19 27 0 1 4 5 23
- There can be "easy" instances, e.g.
1 2 3 4
,
and "harder" instances, e.g. 9 -12 137 1
(though what is easy and what is hard sometimes depends on the
way your solution works)
- Each problem has many
different algorithms which produce correct solutions.
As a programmer
we must select which algorithm is `best' to implement.
- When choosing an algorithm there are
many influencing factors:
- the perceived `difficulty' of the algorithm,
- our familiarity with it,
- the adaptability of the algorithm,
- the existence of pre-packaged software ,
- the resource requirements of the algorithm.
- the kinds of instances we expect to encounter, and which algorithms
perform particularly well or particularly poorly on these.
- In this course we are primarily concerned with analyzing the last
two points
- Resources can include many different things, CPU time,
real time, memory requirements, disk space, I/O requirements, etc.
For the sake of simplicity we usually focus on
- the space requirements
(i.e. how much memory does the algorithm force us to use)
- the time requirements
(i.e. how much CPU time will be used in calculating a solution)
- Ideally, we want to
calculate an algorithm's resource requirements without having
to implement it first
- Example: suppose we are given the following algorithm
and a problem instance
x == 6
y = 0;
sum = 0;
while (y < x) {
sum = sum + y;
}
In a ''typical computing environment'' (we'll discuss the weaknesses
of this statement later) how many operations are performed?
- 7 comparisons
- 6 additions
- 8 assignments
- 6 branches (from bottom of while to top)
The total time taken will depend on the time required for each of these
operations.
The total space (i.e. memory) requirements will include the
storage locations for x, y, sum, and possibly one or more
registers.
- The process of exactly calculating the resource
requirements of an algorithm is extremely difficult, especially if we want
a general description, rather than one which requires a seperate analysis
for each different instance
It requires
extensive knowledge about the algorithm, the data set, and
the system on which the algorithm is run
- To get around this problem, usually we will settle for
an approximation of the resource requirements -
how to develop the approximations is a major topic for this course.
Why time trials are inadequate
- If calculating the resource requirements is so difficult,
then why not just
- code the algorithm,
- run the program, timing it and measuring the amount of memory?
- If we're
trying to choose which algorithm to implement, we would have to
implement both, time them, and then throw one away.
This is not practical
given the expense associated with implementing large software projects.
- Suppose we're asked to compare two algorithms which
have
already been implemented:
For a `fair' test, we would need
- both programs implemented
and run on the same system (often not available),
- by programmers with comperable skill levels (tough to ensure),
- running on a good representative data set (which can be extremely
difficult to develop),
- under normal working conditions (difficult to even define, never mind
obtain)
- Our goal is to develop a set of analysis techniques
for evaluating the general behaviour of
algorithms.
These should be expressed as a function of the size of the problem
instance
(some algorithms might be good for small problems but poor for large ones)
e.g. Given any set of N numbers, perhaps algorithm 1 will require
at most f(N) steps to sort the numbers, while algorithm 2
will require at most g(N) steps.
If we know roughly how big N is for the problems we'll
be working with, we can compare f(N) and g(N)
to decide which we should use.
Of course, figuring out f(N) and g(N) is the tricky
part.
Why worry about algorithm efficiency?
The basics of asymptotic notation
- Asymptotic notation is the chosen term for one process which
allows us to:
- Define how we will represent the size of a problem
(e.g. for sorting, the size might be the number of things to be sorted)
- Define what an elementary operation is
(e.g. for sorting, we might consider the number of comparisons we make
while sorting)
- Express an approximation of the total number of elementary operations
performed
as a function of the problem size (e.g. we may determine that
sorting algorithm X requires at most N2
comparisons to sort a list of N numbers)
- Express an approximation of the total amount of data storage needed
as a function of the problem size
- Ideally, we will determine three bounds on the time (and/or
space) requirements:
- the best case: e.g. the smallest number of operations the
algorithm could use for a problem of size N
- the worst case: e.g. the largest number of operations the
algorithm could use for a problem of size N
- the expected case: e.g. over all possible problems of size N,
what is the average number of operations the algorithm uses
- Note that these all describe how well a specific algorithm
performs.
It would also be useful to determine (if possible) what the best possible
performance
of any algorithm could be.
This is a much more difficult problem, since
obviously we
cannot list and evaluate every possible algorithm, but in some cases very
useful bounds
can be calculated.
- When using asymptotic notation, we will calculate
values as approximations: very often we will drop constants
(e.g. 3.5N + 20000 will be rounded off to N,
and similarly 0.1N + 100000000 will be rounded off to N)
The rationale for using asymptotic notation is, firstly,
as the value of N gets large the contribution of the
constants becomes relatively unimportant when comparing the behaviour of two
or more algorithms, and secondly, some functions show very erratic behaviour
in some data ranges so an overall view gives a clearer picture than
we would get from using a specific set of values for N.
With respect to the first point, some functions grow
much more quickly than others once the value of N becomes large.
For instance, N2 grows much more quickly than lg(N).
Given sufficienctly large values of N, even the addition of extra
terms would have relatively little impact: e.g. N2 - N
would still grow much more quickly than 100*lg(N), even for large values
of N.
For this reason, we will discuss the order of functions -
look at only the largest relevant non-constant terms when
comparing functions.
Thus we will group large sets of functions into orders,
which have approximately the same relative growth.
To be precise, we will say function f(N) is in order O(g(N)) if,
for all sufficiently large values of N, there exists some constant C
such that f(N) ≤ Cg(N).
Here are a few examples of functions in different orders:
O(1) | O(lg(N)) | O(N) | O(N2) |
O(2N) |
0.00001 1 1000000 |
lg(N) / 17 100000 lg(N) 100 lg(N) + 99999 1 |
N + 1/N 10000 N + 99999 lg(N) N/100000 99999999 |
3N2 + 10000 N + lg(N) + 900 1000 lg(N)
N 30000000 N2 + 10000 N1.9 |
2N + N2 + √N + N + lg(N) + 1
2N + 1.5N + N100
0.00001 |
Observe that the "smaller" orders are completely
contained within the larger ones.
When we determine the running time of two algorithms as some functions,
f(N) and g(N), we will usually want to show that one algorithm is
better (or worse, or no better, etc) than another.
We use asymptotic complexity as the broad categories to group functions
in terms of "equivalent efficiency".
Examples:
- Suppose the running times we have identified for two algorithms are
f(N) = 6N2 and g(N) = N3, and we want to show that the
f(N) algorithm is no slower (asymptotically) than the g(N) algorithm.
This is equivalent to trying to show that function f(N) is in the order of g(N),
i.e. for all sufficiently large values of N, there exists some constant C such
that f(N) ≤ Cg(N)
In this case, that means showing there exists constant values n0
and C such that for all N ≥ n0, 6N2 ≤
CN3
If we use n0 = 6 and C = 1 we will see this is a true statement.
i.e. 6N2 ≤ N3 for all N ≥ 6.
- Suppose f(N) = 2N2+4 and g(N) = N2
Let C = 4 and n0 = 2 and we will again see this is a true statement.
i.e. 2N2 ≤ 8N2 + 64 for all N ≥ 2.
- Suppose f(N) = (N+1)! and g(N) = N!
We can show that f(N) is slower than g(N), i.e. grows asymptotically faster
than g(N), using proof by contradiction:
- assume (N+1)! is in O(N!)
- this would mean there exist C, n0 such that for all N ≥
n0, (N+1)! ≤ CN!
- simplify by dividing both sides by N!, giving
(N+1) ≤ C for some constant C and for all values of N ≥ n0
- Clearly this is false, since regardless of the choice of constant C there
exists some larger value of N
- Since a logical impossibility has arisen, our assumption must be incorrect,
and hence (N+1)! is NOT in O(N!)
A few basic orders of complexity, in increasing order: (assuming integer values for k)
O(1), O(lg(lg(N))), O(lg(N)), O((lg(N))k),
O(N), O(N lg(N)), O(Nk),
O(2N), O(N 2N), O(N!), O((N+k)!),
O(22N)
What are the elementary operations?
To actually compute the efficiency of a specific algorithm, we need
to define what we use as indicators of running time - i.e. what are the
fundamental operations performed, and how many of them are performed on
a problem of size N?
How do you measure problem size?
- Our goal in analyzing an algorithm
is to produce a function which describes the algorithm's
use of resources as a function of the size of the problem
- We must formulate some way of
describing the `size' of a problem: this is something of
an art, but experience and examples help:
- When working with a database, the size of the problem
might be the number of items in the database
- When sorting data, the size of the problem might be
the number of items in the list to be sorted
- When trying to determine if a large number is prime,
the size of the problem might be the number of digits in the
input number
- When trying to multiply two matrices together, the size
of the problem might be expressed as the dimensions of the matrices,
or might be expressed as the number of data items in the matrices
Example: binary search
Suppose we are given an array of N integers, sorted in increasing order,
and a search routine which determines if a specific key is found
anywhere in the array between two specific points (lower and upper).
Below we give one version which uses a linear search through the data,
and two different versions which use a binary search, one iterative and one
recursive:
int lin_search(int arr[], int key, int lower, int upper)
// return the position of the key if found
// otherwise return -1
{
int pos;
pos = lower;
while ((pos <= upper) && (arr[pos] != key)) pos++;
if (pos <= upper) return(pos);
else return(-1);
}
int it_bsearch(int arr[], int key, int lower, int upper)
// return the position of the key if found
// otherwise return -1
{
int low, upp, mid, found;
found = 0; low = lower; upp = upper;
while ((low <= upp) && (found == 0)) {
mid = (low + upp) / 2;
if (arr[mid] == key) found = 1;
else {
if (arr[mid] < key) low = mid + 1;
else upp = mid - 1;
}
}
if (found == 1) return(mid);
else return(-1);
}
int rec_bsearch(int arr[], int key, int lower, int upper)
// return the position of the key if found
// otherwise return -1
{
int mid;
if (lower > upper) return(-1);
mid = (lower + upper) / 2;
if (arr[mid] == key) return(mid);
if (arr[mid] < key) return(rec_bsearch(arr, key, mid+1, upper));
else return(rec_bsearch(arr, key, lower, mid-1));
}
- Size of the problem: we could describe the size, N, of the
problem
as the number of data items to be searched - i.e. N = upper + 1 -
lower
- Elementary operations: selecting the elementary data operations
is less obvious. Suppose, since function calls are very expensive, we just
decided to count function calls as the basic elementary operation.
The first two algorithms only have one function call, regardless of
the data set, whereas the recursive algorithm can have up to
lg(N) + 2 calls:
- worst case behaviour occurs when the key is not in the array
- the first call has N elements to search
- the second call has (N/2) - 1 elements to search
- the third call has (N/4) - 1 elements to search
- the fourth call has (N/8) - 1 elements to search
- ... etc ...
- the k-th call has (N/2k) - 1 elements to search
- this process continues until we run out of elements to search,
i.e. eventually lower > upper, which occurs after 2k > N,
i.e. after lg(N) calls
What are the problems with this choice of elementary operation?
- It doesn't allow us to distinguish between the first two algorithms
(they both always just perform one elementary operation)
- It doesn't allow for the possibility that the first two algorithms
perform so many other operations (i.e. other than function calls) that
they may actually be worse than the recursive binary search.
As an alternative choice of elementary operation, let us consider how many
times each algorithm has to look at elements inside the array. That is, every
time
we see something of the form arr[...] we will consider that to be one
elementary operation.
- Calculating worst case behaviour:
- Linear search: the array references are all within the while loop,
one per pass, and the most passes we can make are from lower through
upper inclusive, i.e. N total array references
- Recursive binary search: for each call to rec_bsearch we make at most
two array references (in the two if statements), and we calculated earlier
that there are at most lg(N) + 2 function calls, therefor the
recursive
version requires at most 2(lg(N) + 2) total array references
- Iterative binary search: each pass through the while loop, the
iterative
binary search routine makes at most 2 array references.
You should be able to
show
that at most lg(N) + 2 passes through the while loop are required
(by a very similar logic to showing that at most lg(N) + 2 function
calls
were required for the recursive version).
Thus the iterative version also
requires
at worst 2(lg(N) + 2) array references.
Consider the results of this for a few values of N:
N (linear search) | 2(lg(N) + 2) (bsearches)
-------------------+---------------------------
1 | 4
2 | 6
4 | 8
8 | 10
16 | 12
32 | 14
1000 | 24
1,000,000 | 44
1,000,000,000 | 64
Note that for small values of N the linear search is actually better,
but for large values of N the binary searches are much better.
- Best case behaviour: what about analysing the best case
performance of all three algorithms? For all three algorithms we might
`luck out' and find the key in the first place we look (i.e. arr[lower]
for the linear search, or arr[(upper+lower)/2] for the binary searches).
Thus all three have a best case of one array reference.
- Average case behaviour: How might we determine average case
behaviour?
Here's one possibility, consider values a1
< a2
< a3
< ... < an
to be
the N elements in the array. Try searching for each of those values, and try
searching
for a value a0 <
a1, and values a2', ..., an' where
ai-1 < ai' < ai)
- Picking an algorithm:
If you had to choose an algorithm to implement and use, which of the
three would you use and why? Points to consider include:
- best case, average case, worst case behaviour
- is the data sorted (only linear search works on unsorted data)
- how large will N usually be