CSCI 260 Fall 2005: Notes

CSCI 260 Fall 2005: Notes: proof techniques

When trying to make conclusive statements regarding the efficiency of algorithms, we find ourselves faced with developing a proof of our claims.

For each proof, we begin with a set of known facts and rules and a claim we wish to prove or disprove.

The most common facts we typically begin with (for proofs about complexities and efficiency) are

one or more formulae describing run times as functions of input size (e.g. f(N) = N² + 2N and g(N) = 0.25 N³
our standard definition of ∈ for complexity orders, i.e.

Notation: ∃ means there exists, ∀ means for all, | means such that

Thus the "standard definition" above would read:

f of N is in order g of N
if and only if
there exist c and n₀ such that,
for all N greater than or equal to n₀,
f of N is less than or equal to c times g of N.

Essentially this is saying that the complexity of f(N) is not more than a constant factor worse than g(N) once we get into large enough values of N.

The biggest decision in trying to develop a proof is (usually) "which proof technique should be used?". Unfortunately, determining in advance which technique will be best is something of an art - requiring a mix of insight and experience.

There are a great many proof techniques, each of which is applicable in a variety of cases. For this course (and this lab) we will focus on just a handful.

Constructive proofs
In such a proof we repeatedly apply the known rules to the known facts to prove the truth of new facts, until eventually we arrive at a proof of the target claim.
While this technique often results in the clearest, most intuitive proofs, it relies on our ability to recognize ways to combine existing facts and rules in a sequence that will (hopefully) produce the desired result.
For instance: If f(N) = 1000N and g(N) = 0.001 N² then prove f(N) ∈ O(g(N)).
- f(N) = 1000N
  if N > 1 then 1000N < 1000N², therefore:
- f(N) < 1000N² if N > 1
  multiplying by 1000 * 0.001 is the same as multiplying by 1, therefore:
- f(N) < 1000 * 1000 * 0.001 N² for N > 1
  0.001 N² is g(N), therefore:
- f(N) < 1000 * 1000 * g(N) for N > 1
  or, rewritten,
- f(N) < 1,000,000 g(N) for N > 1
  thus for c = 1,000,000 and n₀ = 1 we find:
- f(N) < c g(N) for N > n₀
  which, by our standard definition, means
- f(N) ∈ O(g(N))
Proof by contradiction
Proof by contradiction attempts to prove that something is true essentially by proving it is impossible for it to be false.
This is done by assuming the opposite of the claim and then proving that some logical impossibility arises.
For instance: prove N³ ∉ O(N²
1. First, we assume the opposite is true, i.e.
  Assumption: N³ ∈ O(N²)
2. Therefore, by our standard definition, ∃ c, n₀ | N³ < c N² ∀ N > n₀
3. Dividing both sides by N² gives:
  ∃ c, n₀ | N < c ∀ N > n₀
4. regardless of what value of c was chosen, if N == c+1 then the statement above fails
5. Thus there is a logical contradiction between steps 3 and 4, which means our original assumption could not have been valid.
6. Our original assumption was N³ ∈ O(N²), and if that is invalid then the only other possibility is that N³ ∉ O(N²)
Disproving a claim
While disproving something isn't really a proof technique, it is worth of some seperate discussion.
When asked to disprove something we are trying to prove that the claim is not valid, and often this is simply a matter of finding an instance or fact that contradicts the claim. For instance, if asked to disprove the claim "all dogs are brown" it would suffice to find a single non-brown dog.
In the context of complexity arguments, the claims are more likely to be along the lines of: Disprove the claim that O(f(N)) ⊆ O(g(N)).
The claim states that O(f(N)) is a subset of O(g(N)), i.e. it claims every function in O(f(N)) is also in O(g(N)).
To disprove this claim, it would be sufficient to find a single function in O(f(N)) that is NOT in O(g(N)).

Proof by induction

Suppose we wanted to prove that, for all positive integer values of N, N² < (N+1)².

We attempt to prove this one case at a time, e.g.

show 1² < 2²
show 2² < 3²
show 3² < 4²
... etc

each case would be true, but sooner or later we'd have to give up, presumable before reaching infinity.

Instead, we can prove a starting point (e.g. N = 1 in the example above) and come up with general proofs that take us from any one case to the next.

E.g. in the example above, we would try to create a proof that, if the theory held for N = 1, ... , k then we could prove it also held for N = (k+1).

If our rule was valid, and we had enough time, we could do the proof in k+1 individual stages... start with N=1 (our base case, for which we have already proven it's true)
apply our rule starting from N=1, which proves the N=2 case,
apply our rule starting from N=2, which proves the N=3 case,
.... etc etc, this provides a valid mechanism for stating the truth of the claim for any larger value of N
For a full and correct inductive proof, we must provide:

A proof that the property holds for one or more smallest values of n (called the base case(s))
An assumption (the induction hypothesis) that the property holds for all intermediate values of n from the base cases through to a variable value, k (in fact, this is termed strong induction)
A proof that, given parts 1 and 2, the property holds for the case n = k + 1

Again, these three parts are sufficient for a valid proof because, given sufficient time, we could use them to build a constructive proof of the property for any specific value of n.

Example:

1 + 2 + 3 + ... + n = n(n+1)/2

n

Our inductive proof would be laid out as follows:

Base case: prove that 1 = 1(1 + 1)/2
1(1+1)/2 = 1(2)/2 = 2/2 = 1, thus the claim is true for the base case
Induction hypothesis: we assume the following: for all values of n in the range 1..k, the sum of 1 through k is k(k+1)/2.

Induction step:

Target to prove: given the base case and induction hypothesis, prove 1 + 2 + ... + k + (k+1) = (k+1)((k+1)+1)/2

Proof:

1 + 2 + ... + k + (k+1) = (1 + 2 + ... + k) + (k+1)
       = k(k+1)/2 + (k+1) by induction hypothesis
       = (k+1)(k/2 + 1)
       = (k+1)((k+2)/2)
       = (k+1)(k+2)/2
       = (k+1)((k+1)+1)/2

Which proves our target!

NOTE: it is very important in inductive proofs to very clearly point out exactly where and how the induction hypothesis is being applied

Proof by substitution (solving recurrences)

We will often be called upon to evaluate the efficiency of a recursive algorithm.

Sometimes we will be able to use constructive reasoning and proofs to conclusively demonstrate the efficiency, but in many cases such proofs are difficult to arrive at.

There are several other approaches to solving recurrences, but for now we'll focus on a single one: substitution.

In this method, we guess what the correct running time will be, and then use proof by induction to show that our guess was correct.

Needless to say, this only works if we guessed correctly, so it is most useful when we have a "gut feel" for what the running time is. Of course, we can make very pessimistic guesses initially (which should be easy to prove correct) and gradually refine them to be more and more accurate - until eventually we hit a point where no longer succeed in finding a proof. (Which doesn't mean there isn't a proof, it could mean the proof has just eluded us so far.)

Example: suppose the running time of an algorithm is expressed as follows:

We are asked to determine the order of complexity for f(N).

This looks like a divide-and-conquer type of efficiency, where the running time for a problem of size N is based on some constant times the running time for half the problem, plus a linear stage to "put the pieces together".

Based on that observation, we guess that the running time is O(N lg(N))

Now we try to use an inductive proof to show that f(N) ∈ O( N lg(N) ),
i.e. ∃ c, n₀ | f(N) ≤ c N lg(N) ∀ N > n₀

Base case: lg(1) is 0, which could cause some complications later, so let's start off with a base case of N = 2
(remember for our proofs of orders we have the "N > n₀" clause, so we don't have to start proofs at N = 0 or 1)
We need to show there is a c that makes f(2) ≤ c N lg(N),
i.e. such that f(2) ≤ c 2 lg(2)
i.e. such that f(2) ≤ c * 2
- f(2) = 4 f(2/2) + 100*1
- f(2) = 4 f(1) + 100
- f(2) = 4 + 100
- f(2) = 104
Thus any value of c > 52 will suffice to validate our claim for the base case.
Induction assumption: assume that, for N ∈ 2, ..., k, f(N) ≤ c N lg(N)
Induction step: we need to prove that f(k+1) ≤ c (k+1) lg (k+1)
.... TO BE COMPLETED ....

Problem: in fact, there is a weakness in this proof, in that it assumes N/2 is an integer value. A similar but more thorough approach is taken in the example at the end of the course notes on substitution proofs.