External Sorting Algorithm
Internal sorting conditions:
- data stored in continuous storage space (array)
- fully comparable keys (total order data typed keys)
- randomly accessible to every element (in main memory)
External sorting conditions:
- input data amount too large for main memory
- input data stored in sequential storage
- output sorted sequence to sequential storage
- I/O cost is significantly larger than in-memory operation (such
as comparison) cost
External sorting algorithm criteria:
- access data in sequential order
- minimize number of times an item is accessed from disk
External sorting algorithm:
1. break a large data file into shorter "runs" of data, so that each
run can fit to main memory and be sorted (using internal sorting algorithm).
2. merge two or more runs (depends on how many input buffers there are)
together into a longer run.
3. repeat step 2 until there is only one sorted file.
Performance of the external sorting algorithm:
- If M records can be sorted in memory, and the file has N records,
the number of initial runs is N/M.
- If we can merge T runs in each pass, then we need
ceiling(logT(N/M)) passes to merge all runs together
since each pass reduces the number of runs by T.