Data Structures and Algorithms

Algorithm Analysis

The performance of a program usually depends on:

Data structures and algorithms;
Characteristics of input data;
Computer hardware, including but not limited to CPU, memory and disk;
Programming language used to develop the program;
Compiler/interpreter;
Software environment; and
Network communication protocols and connections.

An algorithm is a step-by-step procedure for solving a problem in a finite amount of time. Time efficiency of an algorithm indicates how fast an algorithm runs; space efficiency deals with the extra space the algorithm requires. Other analysis includes correctness, robustness and maintainability.

The running time of an algorithm typically grows with the input size. Average case time efficiency is often difficult to determine. So we usually focus on the worst case scenario analysis because it's easier to analyze and crucial to applications.

We can use experiments to measure an algorithm's running time as a function of the input size. But there are limitations with this method:

The algorithm needs to be implemented first, which may be difficulty and costly;
The measured results may not be comprehensive enough to capture the charactristics of all types of inputs;
The results are only measured on specific hardware and software environment.

The preferred way is to analyze the running time of an algorithm theoretically, characterize the running time as a function of the input size (N). This method allows us to evaluate the time efficiency of an algorithm independent of the hardware and software environment.

General steps of asymptotic analysis of an algorithm:

Define the problem size (input size), N;
Count the number of the primitive operations as a function of N;
Determine the growth rate of the function.

It is obvious that almost all algorithms takes longer to run on larger inputs. We usually use a parameter, N, to indicate the input size of an algorithm. Sometimes, several parameters combined together are used to define the input size.

Primitive Operations include:

assigning a value to a variable;
calling a method;
performing an arithmetic operation;
comparing two numbers;
accessing an array element;
following a pointer or a reference;
returning from a method;

Treating all primitive operations the same and ignoring the hardware and software environment difference may affect the estimate of the running time of an algorithm by a constant factor, but it does not alter the growth rate of the running time efficiency function.

For large input size N, it is the function's order of growth that matters:

N logN N NlogN N² N³ 2^N N!

10 3 10 33 100 1K 1K 3.6M

32 5 32 160 1K 32K 4B large

64 6 64 384 4K 256K 16BB large

10³ 10 10³ 10⁴ 10⁶ 10⁹ - -

10⁶ 20 10⁶ 10⁷ 10¹² 10¹⁸ - -

As a comparison, 16 billion billion seconds is 543 billion years. The Earth's age is about 4.5 billion years, and the universe's age is about 14 billion years.

Asymptotic Notations

The upper bound Ο definition:
Let f(n) and g(n) be functions mapping non-negative integers to real numbers. f(n) is Ο(g(n)) if there exists a real constant c > 0 and an integer constant n₀ >= 1, such that f(n) <= cg(n) for every integer n >= n₀.
In other words, the growth rate of f(n) is no more than that of g(n).
Ω and Θ:
If g(n) is Ο(f(n)), then f(n) is Ω(g(n)).
If f(n) is Ο(g(n)) and Ω(g(n)), then f(n) is Θ(g(n)).
ο and ω
f(n) is ο(g(n)) if for every real constant c > 0, there exists an integer constant n₀ >= 1, such that f(n) <= cg(n) for every integer n >= n₀.
In other words, the growth rate of f(n) is less than that of g(n).
If g(n) is ο(f(n)), then f(n) is ω(g(n)).

The Ο notation denotes a class of functions. The growth rates of all functions in the same class are the same. The Ο families provide us an convenient way to analyze algorithms because they let us focus on the big picture rather than the details.

Ο Rules

If f(n) is a polynomial of degree d, then f(n) is Ο(n^d).
I.e., drop the lower order terms and the constant factors.
Use the smallest possible class of functions.
For example, 2n is Ο(n²), but we usually say 2n is Ο(n).
Use the simplest expression of the class.
Ο(3n²) should be Ο(n²).

Rules for Ο Analysis of running time of an algorithm:

Sequential Composition:
S₁, S₂, ..., S_k
T_S = Ο(max(T₁, T₂, ..., T_k))
Iteration:
for i from 1 to k { Exp }
T_I = Ο(max(k, T_Exp * k))
Conditional Execution:
if cond then Exp1 else Exp2
T_C = Ο(max(T_cond, T_Exp1, T_Exp2))

Ο Analysis for recursive algorithms:

Write a recurrence equation for running time function;
Solve the recurrence equation;
Classify the result to an Θ(f(n)) family.

The Master Theorem:

T(n) = aT(n/b) + f(n), if n >= d

If there is a small constant ε > 0, such that f(n) is Ο(n^{log_ba - ε}), then T(n) is Θ(n^log_ba).
If there is a constant k >= 0, such that f(n) is Θ(n^log_balog^kn), then T(n) is Θ(n^log_balog^k+1n).
If there are small constants ε > 0 and δ < 1, such that f(n) is &Omiga;(n_{log_ba + ε)
and af(n/b) <= δf(n) for n >= d, then
T(n) is Θ(f(n)).}

N	logN	N	NlogN	N²	N³	2^N	N!
10	3	10	33	100	1K	1K	3.6M
32	5	32	160	1K	32K	4B	large
64	6	64	384	4K	256K	16BB	large
10³	10	10³	10⁴	10⁶	10⁹	-	-
10⁶	20	10⁶	10⁷	10¹²	10¹⁸	-	-