One of the most common, and most useful, data types is the tree.
A tree is composed of many connected nodes or elements, just as in stacks, lists, and queues, but the interconnection arrangement is different.
In a typical tree organization, each node consists of some data and some pointers to other nodes, referred to as children of the current node.
If the tree held integer data, then a typical graphical representation might look something like:
3 /|\ / | \ (the origin of the term "tree" 1 4 9 is obvious if you look at the / / \ \ structure upside down) / / \ \ 6 0 8 5 | / | / 2 7We often refer to the "top" node in such a representation as the root of the tree. The root is the only node which is not the child of some other node.
At the other end, any node which does not have any children is often referred to as a leaf.
The level of a particular node is its distance from the root, while the height (or depth) of the tree is the number of levels in the tree.
In the tree shown above:
Each child of a particular node is also said to represent a particular subtree - i.e. a smaller tree "underneath" the current node.
Trees are typically organized so that different kinds of data are stored in different subtrees, allowing us to categorize and access our data in a more efficient fashion.
The most common operations associated with trees are
In the case of trees, the most common type of tree is the binary tree.
A tree is said to be a binary tree if every node in the tree has at most two children. (I.e. each node has zero, one, or two children.)
In binary trees we refer to the two children (where they exist) as the left child and the right child.
For example, the following is a valid binary tree:
9 / \ / \ 3 7 / \ \ 11 1 4 / 6 / \ 4 1In this tree, node 7 is the right child of node 9, node 11 is the left child of node 3, etc.
For this purpose, we will use a special category of binary trees call binary search trees, or BST.
A tree is a binary search tree if the following holds:
Examples of valid binary search trees:
5 9 0 / \ / \ / \ 8 0 2 12 / \ / \ \ 7 1 1 4 14 / \ / 6 4 13 / 3 \ 1Examples of invalid binary search trees:
5 9 0 0 / \ / \ \ / \ 8 0 1 2 12 / \ \ / \ \ 7 1 0 1 5 14 / \ / 7 4 13 / 5
Initially, we will consider the following operations for BSTs:
As we do this, we need to ensure that we maintain the BST properties, i.e. we always insert elements on the correct "side" of the current element.
The algorithm to do this is recursive, starting with the root node of the tree:
Let C be the data value of the current node, and D be the data value being inserted If the tree is currently empty then simply create a node to contain D and make that the root of the tree, otherwise perform the following: If D is less than C then if the left subtree is currently empty create a new node containing D and make it the left child of the current node otherwise call insert, with value D, on the left child of the current node Otherwise (D is greater than or equal to C) if the right subtree is currently empty create a new node containing D and make it the right child of the current node otherwise call insert, with value D, on the right child of the current nodeObserve: suppose we were to insert values 3, 5, 2, 4, 1, 6, and 7 in that order:
Initially we store 3 as the only tree value 3 Then, since 5 is greater than 3 it becomes the right child of 3 3 \ 5 Next, since 2 is less than 3 it becomes the left child of 3 3 / \ 2 5 Next, since 4 is greater than 3 it is inserted in the right subtree of 3, and since 4 is then less than 5 it becomes the left child of 5 3 / \ 2 5 / 4 Next, since 1 is less than 3 it is inserted in the left subtree of 3, and since it is less than 2 it becomes the left child of 2 3 / \ 2 5 / / 1 4 Next, since 6 is greater than 3 it is inserted in the right subtree of 3, and since it is greater than 5 it becomes the right child of 5 3 / \ 2 5 / / \ 1 4 6 Finally, since 7 is greater than 3 it is inserted in the right subtree of 3, and since it is greater than 5 it is inserted in the right subtree of 5, and since it is greater than 6 it becomes the right child of 6 3 / \ 2 5 / / \ 1 4 6 \ 7A note on the efficiency of insert:
Since we only examine one node at each level while we search for a spot to insert a new value, the number of calls to insert is less than or equal to the height of the tree.
If the tree is very dense (i.e. most nodes have two children) then the height of the tree is roughly log2(N), where N is the number of nodes in the tree. (Specifically, a completely full tree of height h contains 2h-1-1 nodes.)
This means that, for a full tree of a million elements, the number of recursive calls is still only about 20 ... far better than having to do a search/insert on a sorted list, where we would have to traverse (on average) a half-million elements!
Unfortunately the insert algorithm we've discussed so far does not ensure that the tree created will be "almost full": suppose we were given the elements in sorted order, then the tree would look like
1 \ 2 \ 3 \ 4 etcIn this case, each new insert would involve traversing all the previously entered values, which would be about the same as the old "sorted list" insert.
If the data in the current node matches the search key then we are done otherwise: if the data in the current node is greater than the search key then we must search the left subtree: if the left subtree is empty then there is no match otherwise make a recursive call on the left child and key otherwise we must search the right subtree: if the right subtree is empty then there is no match otherwise make a recursive call on the right child and keyEfficiency: as with the insertion process, the search process can at worst involve following a single path from the root node to a leaf. Thus the time taken to search the tree is bound by the height of the tree.
Again, if the tree is (close to) full, the search time is on the order of log2(N) for a tree with N elements, but if the tree is very sparse the search time is on the order of N.
Typical applications include:
Tree traversals can be performed very efficiently with simple recursive procedures, and there are three primary types of traversal:
process the current node's data make a recursive call to process the left subtree make a recursive call to process the right subtree
make a recursive call to process the left subtree process the current node's data make a recursive call to process the right subtree
make a recursive call to process the left subtree make a recursive call to process the right subtree process the current node's data
Note that the only difference in the three traversals is the time at which we process the data for the current element.
We will consider preorder and postorder traversals when we consider the evaluation of expression trees, but for binary search trees we are primarily interested in the inorder traversals.
If an inorder traversal is applied to a binary search tree then the data values contained in the tree are processed in sorted order.
This makes sense if we consider the tree structure and the nature of the traversal:
process the left subtree - i.e. process elements smaller than the current one process the current element process the right subtree - i.e. process elements larger than the current one
To determine the size of a tree, we simply determine the size of the two subtrees (using recursive calls), add these together, then add one (for the current node).
size = 1 if the left child is not null, then make a recursive call to determine the size of the left subtree, and add this to size if the right child is not null, then make a recursive call to determine the size of the right subtree, and add this to size return the sizeTo determine the height of the tree, we determine which of the two subtrees has greater height and add one to this value
current height = 0 if the left child is not null, then make a recursive call to determine its height, and add this value to current height if the right child is not null, then make a recursive call to determine its height, and if this value is greater than the current height then update the current height add one to the current height and return this valueIn terms of efficiency, the running time for each of these algorithms is proportional to the size of the tree.
Consider the following trees, look for the largest and smallest values, and a pattern that defines them:
3 3 8 4 / \ / \ / \ 2 4 / \ / \ / \ 6 10 2 6 1 4 \ / / \ / \ 3 9 1 3 5 7By noting that larger values (if any) must always be to the right, and smaller values (if any) must always be to the left, we arrive at the following algorithms:
FindLargest: if there is no right child then return the current element otherwise call recursively on the right child FindSmallest: if there is no left child then return the current element otherwise call recursively on the left childCreating the mirror image of a tree
Suppose we decided that we wanted to reverse the structure of a tree - creating its mirror image.
Again, this is possible with a relatively simple recursive function:
- swap the current node's left subtree with its right subtree
- call recursively on the left child
- call recursively on the right child
Consider the tree below, what needs to happen to delete value 6?
5 / \ / \ / \ 3 6 / \ / \ 4 8 \ / \ 5 7 8If we delete 6, we can't simply move one of the children up a level, we need to do something more advanced.
Observe that the smallest value in the right subtree of 6 will still be larger than every value in the left subtree of 6, and will be at least as small as everything else in the right subtree of 6.
Suppose, then, we find the smallest value in the right subtree, replace 6 with that value, then go delete the old copy of that value?
In the above example we would wind up with another valid binary search tree:
5 / \ / \ / \ 3 7 / \ / \ 4 8 \ \ 5 8Note that if the right subtree of 6 had been empty, we could instead have simply replaced 6 with its left child.
This leads us to the following simple algorithm for deletion:
- if the node has no children then simply delete it - otherwise, if the node has only one child then replace the deleted node with its child - otherwise, look in the right subtree of the node to be deleted, finding the node with the smallest value, call it N replace the node-to-be-deleted with the newly selected node N, and make sure you remove N from its old location down the right subtree