Course notes(intro)

Frequently when examining computational problems we wish to compare different problem solutions to each other in terms of efficiency, as well as considering theoretical limits on attainable efficiency.

To do so, we require a few basic tools and techniques, as outlined below.

Time and space efficiency of algorithms

An algorithm is a process or series of steps by which we determine a solution to a particular instance of a problem.
For our purposes `algorithms' are often just pseudo-code.

Example: the problem is to sort an array of N integers:

An algorithm might be

 
SlowSort:
  for i = 1 to N do
      for j = 0 to N-2 do
          if arr[j] < arr[j+1]
             then swap arr[j] with arr[j+1]
      endfor
  endfor

While an instance of the problem might be:

N = 9, and arr[] contains 7 3 -19 27 0 1 4 5 23

There can be "easy" instances, e.g. 1 2 3 4 , and "harder" instances, e.g. 9 -12 137 1 (though what is easy and what is hard sometimes depends on the way your solution works)
Each problem has many different algorithms which produce correct solutions.
As a programmer we must select which algorithm is `best' to implement.
When choosing an algorithm there are many influencing factors:
- the perceived `difficulty' of the algorithm,
- our familiarity with it,
- the adaptability of the algorithm,
- the existence of pre-packaged software ,
- the resource requirements of the algorithm.
- the kinds of instances we expect to encounter, and which algorithms perform particularly well or particularly poorly on these.
In this course we are primarily concerned with analyzing the last two points
Resources can include many different things, CPU time, real time, memory requirements, disk space, I/O requirements, etc.
For the sake of simplicity we usually focus on
- the space requirements (i.e. how much memory does the algorithm force us to use)
- the time requirements (i.e. how much CPU time will be used in calculating a solution)
Ideally, we want to calculate an algorithm's resource requirements without having to implement it first
Example: suppose we are given the following algorithm and a problem instance x == 6
```
   y = 0;
   sum = 0;
   while (y < x) {
      sum = sum + y;
   }
```
In a ''typical computing environment'' (we'll discuss the weaknesses of this statement later) how many operations are performed?
- 7 comparisons
- 6 additions
- 8 assignments
- 6 branches (from bottom of while to top)
The total time taken will depend on the time required for each of these operations.
The total space (i.e. memory) requirements will include the storage locations for x, y, sum, and possibly one or more registers.
The process of exactly calculating the resource requirements of an algorithm is extremely difficult, especially if we want a general description, rather than one which requires a seperate analysis for each different instance
It requires extensive knowledge about the algorithm, the data set, and the system on which the algorithm is run
To get around this problem, usually we will settle for an approximation of the resource requirements - how to develop the approximations is a major topic for this course.

Why time trials are inadequate

If calculating the resource requirements is so difficult, then why not just
1. code the algorithm,
2. run the program, timing it and measuring the amount of memory?
If we're trying to choose which algorithm to implement, we would have to implement both, time them, and then throw one away.
This is not practical given the expense associated with implementing large software projects.
Suppose we're asked to compare two algorithms which have already been implemented:
For a `fair' test, we would need
- both programs implemented and run on the same system (often not available),
- by programmers with comperable skill levels (tough to ensure),
- running on a good representative data set (which can be extremely difficult to develop),
- under normal working conditions (difficult to even define, never mind obtain)
Our goal is to develop a set of analysis techniques for evaluating the general behaviour of algorithms.
These should be expressed as a function of the size of the problem instance (some algorithms might be good for small problems but poor for large ones)
e.g. Given any set of N numbers, perhaps algorithm 1 will require at most f(N) steps to sort the numbers, while algorithm 2 will require at most g(N) steps.
If we know roughly how big N is for the problems we'll be working with, we can compare f(N) and g(N) to decide which we should use.
Of course, figuring out f(N) and g(N) is the tricky part.

Why worry about algorithm efficiency?

Given that computer speed and storage efficiency seem to double every year or two, is it really that important to worry about algorithm efficiency?
Why not just code a reasonably simple algorithm, and get better hardware if we run into really large data sets?
Suppose we have a batch of N numbers to sort, and three different sorting algorithms available:
- Algorithm 1 takes time 2^N to finish sorting
- Algorithm 2 takes time N² to finish sorting
- Algorithm 3 takes time N lg(N) to finish sorting
- lg(x) means the log to the base 2 of x
For different sizes of N, here are the times required by the different algorithms:
```
   N    |     N lg(N)   |     N²      |  2^N
--------+---------------+-------------+-------------
    4   |       8       |         16  |           16 
   16   |      64       |        256  |        65536
   64   |     384       |       4096  |  1.8 * 10^19
  256   |    2048       |      65536  |  1.2 * 10^77
 1024   |   10240       |    1048576  | 1.8 * 10^308
 4096   |   59052       |   16777216  |1.0 * 10^1233
16384   |  229376       |  268435476  |1.2 * 10^4932
```
Obviously speeding up our hardware by a hundred or even a thousand times is NOT going to make up for a poor choice of algorithms once we are dealing with large values of N.

The basics of asymptotic notation

Asymptotic notation is the chosen term for one process which allows us to:
1. Define how we will represent the size of a problem (e.g. for sorting, the size might be the number of things to be sorted)
2. Define what an elementary operation is (e.g. for sorting, we might consider the number of comparisons we make while sorting)
3. Express an approximation of the total number of elementary operations performed as a function of the problem size (e.g. we may determine that sorting algorithm X requires at most N² comparisons to sort a list of N numbers)
4. Express an approximation of the total amount of data storage needed as a function of the problem size
Ideally, we will determine three bounds on the time (and/or space) requirements:
- the best case: e.g. the smallest number of operations the algorithm could use for a problem of size N
- the worst case: e.g. the largest number of operations the algorithm could use for a problem of size N
- the expected case: e.g. over all possible problems of size N, what is the average number of operations the algorithm uses
Note that these all describe how well a specific algorithm performs.
It would also be useful to determine (if possible) what the best possible performance of any algorithm could be.
This is a much more difficult problem, since obviously we cannot list and evaluate every possible algorithm, but in some cases very useful bounds can be calculated.
When using asymptotic notation, we will calculate values as approximations: very often we will drop constants
(e.g. 3.5N + 20000 will be rounded off to N, and similarly 0.1N + 100000000 will be rounded off to N)

The rationale for using asymptotic notation is, firstly, as the value of N gets large the contribution of the constants becomes relatively unimportant when comparing the behaviour of two or more algorithms, and secondly, some functions show very erratic behaviour in some data ranges so an overall view gives a clearer picture than we would get from using a specific set of values for N.

With respect to the first point, some functions grow much more quickly than others once the value of N becomes large.

For instance, N² grows much more quickly than lg(N).

Given sufficienctly large values of N, even the addition of extra terms would have relatively little impact: e.g. N² - N would still grow much more quickly than 100*lg(N), even for large values of N.

For this reason, we will discuss the order of functions - look at only the largest relevant non-constant terms when comparing functions.

Thus we will group large sets of functions into orders, which have approximately the same relative growth.

To be precise, we will say function f(N) is in order O(g(N)) if, for all sufficiently large values of N, there exists some constant C such that f(N) ≤ Cg(N).

Here are a few examples of functions in different orders:

O(1) O(lg(N)) O(N) O(N²) O(2^N)

0.00001
1
1000000 lg(N) / 17
100000 lg(N)
100 lg(N) + 99999
1 N + 1/N
10000 N + 99999 lg(N)
N/100000
99999999 3N² + 10000 N + lg(N) + 900
1000 lg(N)
N
30000000 N² + 10000 N^1.9 2^N + N² + √N + N + lg(N) + 1
2^N + 1.5^N + N¹⁰⁰
0.00001
Observe that the "smaller" orders are completely contained within the larger ones.

When we determine the running time of two algorithms as some functions, f(N) and g(N), we will usually want to show that one algorithm is better (or worse, or no better, etc) than another.

We use asymptotic complexity as the broad categories to group functions in terms of "equivalent efficiency".

Examples:

Suppose the running times we have identified for two algorithms are f(N) = 6N² and g(N) = N³, and we want to show that the f(N) algorithm is no slower (asymptotically) than the g(N) algorithm.
This is equivalent to trying to show that function f(N) is in the order of g(N), i.e. for all sufficiently large values of N, there exists some constant C such that f(N) ≤ Cg(N)
In this case, that means showing there exists constant values n₀ and C such that for all N ≥ n₀, 6N² ≤ CN³
If we use n₀ = 6 and C = 1 we will see this is a true statement. i.e. 6N² ≤ N³ for all N ≥ 6.
Suppose f(N) = 2N²+4 and g(N) = N²
Let C = 4 and n₀ = 2 and we will again see this is a true statement. i.e. 2N² ≤ 8N² + 64 for all N ≥ 2.
Suppose f(N) = (N+1)! and g(N) = N!
We can show that f(N) is slower than g(N), i.e. grows asymptotically faster than g(N), using proof by contradiction:
- assume (N+1)! is in O(N!)
- this would mean there exist C, n₀ such that for all N ≥ n₀, (N+1)! ≤ CN!
- simplify by dividing both sides by N!, giving
  (N+1) ≤ C for some constant C and for all values of N ≥ n₀
- Clearly this is false, since regardless of the choice of constant C there exists some larger value of N
- Since a logical impossibility has arisen, our assumption must be incorrect, and hence (N+1)! is NOT in O(N!)
A few basic orders of complexity, in increasing order: (assuming integer values for k)
O(1), O(lg(lg(N))), O(lg(N)), O((lg(N))^k), O(N), O(N lg(N)), O(N^k), O(2^N), O(N 2^N), O(N!), O((N+k)!), O(2^{2^N})

What are the elementary operations?

To actually compute the efficiency of a specific algorithm, we need to define what we use as indicators of running time - i.e. what are the fundamental operations performed, and how many of them are performed on a problem of size N?

For the sake of simplicity, we usually only count specific kinds of operations - often these will be the operations we expect to take the lion's share of the resources
When analyzing sorting algorithms we might count
- the number of times comparisons are made between data elements,
- the number of times data elements are moved from one spot to another
When analyzing a matrix multiplication algorithm, we might count
- the number of multiplication operations performed
- the number of addition operations performed
When analyzing a search algorithm we might count the number of data elements that the algorithm looks at when searching for a specific piece of data
Identifying an appropriate set of elementary operations is something of an art (experience helps!)
poor choices lead to either very complex calculations or very inaccurate representations of algorithm efficiency

How do you measure problem size?

Our goal in analyzing an algorithm is to produce a function which describes the algorithm's use of resources as a function of the size of the problem
We must formulate some way of describing the `size' of a problem: this is something of an art, but experience and examples help:
- When working with a database, the size of the problem might be the number of items in the database
- When sorting data, the size of the problem might be the number of items in the list to be sorted
- When trying to determine if a large number is prime, the size of the problem might be the number of digits in the input number
- When trying to multiply two matrices together, the size of the problem might be expressed as the dimensions of the matrices, or might be expressed as the number of data items in the matrices

Example: binary search

Suppose we are given an array of N integers, sorted in increasing order, and a search routine which determines if a specific key is found anywhere in the array between two specific points (lower and upper).

Below we give one version which uses a linear search through the data, and two different versions which use a binary search, one iterative and one recursive:


int lin_search(int arr[], int key, int lower, int upper)
// return the position of the key if found
// otherwise return -1
{
   int pos;
   pos = lower;
   while ((pos <= upper) && (arr[pos] != key)) pos++;
   if (pos <= upper) return(pos);
   else return(-1);
}

int it_bsearch(int arr[], int key, int lower, int upper)
// return the position of the key if found
// otherwise return -1
{
   int low, upp, mid, found;
   found = 0;  low = lower; upp = upper;
   
   while ((low <= upp) && (found == 0)) {
      mid = (low + upp) / 2;
      if (arr[mid] == key) found = 1;
      else {
         if (arr[mid] < key) low = mid + 1;
         else upp = mid - 1;
      }
   }

   if (found == 1) return(mid);
   else return(-1);
}

int rec_bsearch(int arr[], int key, int lower, int upper)
// return the position of the key if found
// otherwise return -1
{
   int mid;
   if (lower > upper) return(-1);
   mid = (lower + upper) / 2;
   if (arr[mid] == key) return(mid);
   if (arr[mid] < key) return(rec_bsearch(arr, key, mid+1, upper));
   else return(rec_bsearch(arr, key, lower, mid-1)); 
}

Size of the problem: we could describe the size, N, of the problem as the number of data items to be searched - i.e. N = upper + 1 - lower
Elementary operations: selecting the elementary data operations is less obvious. Suppose, since function calls are very expensive, we just decided to count function calls as the basic elementary operation.
The first two algorithms only have one function call, regardless of the data set, whereas the recursive algorithm can have up to lg(N) + 2 calls:
- worst case behaviour occurs when the key is not in the array
- the first call has N elements to search
- the second call has (N/2) - 1 elements to search
- the third call has (N/4) - 1 elements to search
- the fourth call has (N/8) - 1 elements to search
- ... etc ...
- the k-th call has (N/2^k) - 1 elements to search
- this process continues until we run out of elements to search, i.e. eventually lower > upper, which occurs after 2^k > N, i.e. after lg(N) calls
What are the problems with this choice of elementary operation?
1. It doesn't allow us to distinguish between the first two algorithms (they both always just perform one elementary operation)
2. It doesn't allow for the possibility that the first two algorithms perform so many other operations (i.e. other than function calls) that they may actually be worse than the recursive binary search.
As an alternative choice of elementary operation, let us consider how many times each algorithm has to look at elements inside the array. That is, every time we see something of the form arr[...] we will consider that to be one elementary operation.
Calculating worst case behaviour:
1. Linear search: the array references are all within the while loop, one per pass, and the most passes we can make are from lower through upper inclusive, i.e. N total array references
2. Recursive binary search: for each call to rec_bsearch we make at most two array references (in the two if statements), and we calculated earlier that there are at most lg(N) + 2 function calls, therefor the recursive version requires at most 2(lg(N) + 2) total array references
3. Iterative binary search: each pass through the while loop, the iterative binary search routine makes at most 2 array references.
  You should be able to show that at most lg(N) + 2 passes through the while loop are required (by a very similar logic to showing that at most lg(N) + 2 function calls were required for the recursive version).
  Thus the iterative version also requires at worst 2(lg(N) + 2) array references.
Consider the results of this for a few values of N:
```
 N (linear search) | 2(lg(N) + 2) (bsearches)
-------------------+---------------------------
1                  | 4
2                  | 6
4                  | 8
8                  | 10
16                 | 12
32                 | 14
1000               | 24
1,000,000          | 44
1,000,000,000      | 64
```
Note that for small values of N the linear search is actually better, but for large values of N the binary searches are much better.
Best case behaviour: what about analysing the best case performance of all three algorithms? For all three algorithms we might `luck out' and find the key in the first place we look (i.e. arr[lower] for the linear search, or arr[(upper+lower)/2] for the binary searches). Thus all three have a best case of one array reference.
Average case behaviour: How might we determine average case behaviour? Here's one possibility, consider values a₁ < a₂ < a₃ < ... < a_n to be the N elements in the array. Try searching for each of those values, and try searching for a value a₀ < a₁, and values a₂', ..., a_n' where a_i-1 < a_i' < a_i)
Picking an algorithm: If you had to choose an algorithm to implement and use, which of the three would you use and why? Points to consider include:
- best case, average case, worst case behaviour
- is the data sorted (only linear search works on unsorted data)
- how large will N usually be

O(1)	O(lg(N))	O(N)	O(N²)	O(2^N)
0.00001 1 1000000	lg(N) / 17 100000 lg(N) 100 lg(N) + 99999 1	N + 1/N 10000 N + 99999 lg(N) N/100000 99999999	3N² + 10000 N + lg(N) + 900 1000 lg(N) N 30000000 N² + 10000 N^1.9	2^N + N² + √N + N + lg(N) + 1 2^N + 1.5^N + N¹⁰⁰ 0.00001