Tree data structures
Tree data structures
Tree data structure
Arbitrary
trees
Tree is a widely-used data structure that
emulates a tree structure with a set of linked nodes. The tree graphicaly is
represented most commonly as on Picture 1. The circles are the nodes and
the edges are the links between them.
Trees are usualy used to store and
represent data in some hierarhical order. The data are stored in the nodes,
from which the tree is consisted of.
A node may contain a value or a condition or represent a separate data structure or a tree of its own. Each node in a tree has zero or more child nodes, which are one level lower in the tree hierarchy (by convention, trees grow down, not up as they do in nature). A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent. A node that has no childs is called a leaf, and that node is of course at the bottommost level of the tree. The height of a node is the length of the longest path to a leaf from that node. The height of the root is the height of the tree. In other words, the "height" of tree is the "number of levels" in the tree. Or more formaly, the height of a tree is defined as follows:
- The height of a tree with no elements is 0
- The height of a tree with 1 element is 1
- The height of a tree with > 1 element is equal to 1 + the height of its tallest subtree.
The depth of a node is the length of the
path to its root (i.e., its root path). Every child node is always one level
lower than his parent.
The topmost node in a tree is called the
root node. Being the topmost node, the root node will not have parents. It is
the node at which operations on the tree commonly begin (although some
algorithms begin with the leaf nodes and work up ending at the root). All other
nodes can be reached from it by following edges or links. (In the formal
definition, a path from a root to a node, for each different node is always
unique). In diagrams, it is typically drawn at the top.
In some trees, such as heaps, the root
node has special properties.
A subtree is a portion of a tree data
structure that can be viewed as a complete tree in itself. Any node in a tree
T, together with all the nodes below his height, that are reachable from the
node, comprise a subtree of T. The subtree corresponding to the root node is
the entire tree; the subtree corresponding to any other node is called a proper
subtree (in analogy to the term proper subset).
Every node in a tree can be seen as the
root node of the subtree rooted at that node.
Picture 1. An example of a tree
An internal node or inner node is any node
of a tree that has child nodes and is thus not a leaf node.
There are two basic types of trees. In an
unordered tree, a tree is a tree in a purely structural sense — that is to say,
given a node, there is no order for the children of that node. A tree on which
an order is imposed — for example, by assigning different natural numbers to
each child of each node — is called an ordered tree, and data structures built
on them are called ordered tree data structures. Ordered trees are by far the
most common form of tree data structure. Binary search trees are one kind of
ordered tree.
Representations
There are many different ways to represent
trees; common representations represent the nodes as records dynamically
allocated on the heap memory with pointers to their children, their parents, or
both, and the childen will point to their children and etc., and this repeats
recursevly until a leaf is reached or as items in an array, with relationships
between them determined by their positions in the array (e.g., binary heap).
Here below is an example of the
declaration of the structure of a tree node written in C, in which the node
will be limited to have mostly two children (in this case a left and right child).
This kind of tree is called binary tree:
typedef type_of_data_in_the_node info_t;
struct node{
info_t info;
struct node
*llink;
struct node
*rlink;
};
typedef node * nodep;
info will store the data in the node (for
example: an integer), the pointers llink and rlink will point at the left and right child
node (the two childs) of the node, which are dynamically allocated. The left
and right child nodee can be dynamically allocated and llink and
rlink can be made to
point at them, and the child nodes can point to some other nodes and so on like
this, so it can be formed a continual recursive chain structure between the
root (or subroots) and his/their subtrees. This is how actually the hierarhical
order of the binary tree is formed, and there are very well known algorithms
for creating tree structures consisted from nodes and links. We can traverse
through the nodes of the tree by following the links of the nodes, that is, by
reading the data at the address at which the llink and rlink pointers point.
If a node does not have a left or a right
child or both, then the left or right link or both should have value NULL. In
this case this value of the pointer tells that the node does not have a left or
right child node. The structure of the node can have also a pointer to his
parent.
In languages like FORTRAN, that do not support records or pointers.
Pointers can be implemented by means of arrays of subscripts, for example,
ILB(K) would be the subscript for the left child to node K, and IRB(K) would be
the subscript for the right child. K is an integer number equal to the index of
the array for the nodes values, and corresponds to the pointers of the
addresses of the nodes.
Binary trees are not ordinary trees, they
are special type of trees that have some specific characteristics and uses that
do not apply to the ordinary trees.
For example: the existence, and the
difference between a left and a right branch (link) and child of a node and not
simply branches and childs, the use in binary searching, parsing math formulas
in program languages, and etc.
Unlike the binary trees, there are
structural uncommodities when representing customary (ordinary) in structural
meaning trees in which each node has arbitrary number of children nodes, as the
tree on the Picture 1. One node in that kind of tree would be free to have
zero, one, two, hundred, thousand or what ever number of child nodes. This
makes problems in the declaration of the node structure in the program
languages, and the limited memory of the computer. It is not a good decision to
define some maximal number of links (or pointers) to the children in the
declaration of the node structure, because of the limits, and memory waste.
In this case, a the structure of a binary
tree can be used to represent a customary tree, where there will be no need of wasting
large memory for declarating a maximal number of links for the children. The
nodes will be dynamically allocated, and only two pointers for each node will
be used.
The picture below explains this idea
pretty good:
Picture 2:
>
The left link of some node N will
tell if that node has children nodes. If the left link (pointer) of N is
NULL then this tells that the N has no children nodes. In the other case
the llink pointer will point to one of the children. The node that is reached
by the llink pointer is the one
of the children nodes of N, and then if we recursively follow the right
links of this node, we can visit all the rest of child nodes of N
until rlink is not NULL. The
representation and traversing of the children nodes of a parent node, resembles
a linked linear list. From the root down to the leaves we can recursively
repeat this procedure for every node. A logical conclusion is that rlink in
this representation points to the siblings of a node.
Usual graph representations can also be
used to represent arbitrary tree structure.
In graph theory, a tree is a connected
acyclic graph (or sometimes, a connected directed acyclic graph). A rooted tree
is such a graph with a vertex singled out as the root. In this case, any two
vertices connected by an edge inherit a parent-child relationship. An acyclic
graph with multiple connected components or a set of rooted trees is sometimes
called a forest.
Traversals
Stepping through the items of a tree, by
means of the connections between parents and children, is called walking the
tree, and the action is a walk of the tree (traverse). Often, an operation
might be performed when a pointer arrives at a particular node (visiting the
node – for example, printing the value/s that the node contains). A walk in
which each parent node is traversed before his children (or his subtrees) is
called a pre-order walk; a walk in which the children are traversed before
their respective parents are traversed, is called a postorder walk. There is
also a level order traversal. In this traversal, until all nodes of one level
(beginning from the zero level – the root) are not traversed, the traversing of
the nodes on the next level is not started. Every tree traversal is of O(n)
complexity, where n is the number of nodes in a tree.
Here are the more formal definitions of
the preorder and postorder walks of a arbitrary tree: Let T be the tree, R the
root. Let we have the structure of the tree, whose root is R, and it contains
the subtrees T_{1}, T_{2}, ...,
Preorder
traverse of the nodes of T is an array, that contains the information from the
root R, by which are followed the information from the nodes of the subtree T_{1}
traversed in preorder, and then the information from the nodes of the subtree T_{2}
traversed in preorder, ..., then T_{3} in preorder, T_{4}, ...,
to T_{n} in
preorder._{.}
Postorder traverse of the nodes of T is an
array that contains the arrays of information from the nodes of the subtrees T_{1},
T_{2}, ..., T_{n}, traversed in postorder, starting from T_{1},
T_{2}, ..., to T_{n}, and on the end the information from the root R.
Here are recursive realizations of the
preorder and postorder algorithms in C for traversing a customary (arbitrary)
tree represented by a binary tree, and it's the same for preorder and postorder
traversal of a binary tree:
void preorder(nodep p){ if(p!=NULL){ printf("%d
",p->info); preorder(p->llink); preorder(p->rlink); } } |
void postorder(nodep p){ if(p!=NULL){ preorder(p->llink); preorder(p->rlink); printf("%d
",p->info); } } |
Because
every tree can be represented with a binary tree, the traversals for binary
trees can be used for traversing arbitrary tree structure, if the tree is
represented by a binary tree.
When we traverse a tree in postorder every
child (subtree) must first be visited before his parent (root) is visited, and
a node cannot be visited, if previously his children (subtrees) are not visited
(printed on the screen). If we want to implement the postorder traversal
iteratively, we need to use stack, that will store the chain of ancestor nodes,
previously encountered but not actually visited (printed on the screen),
because they all have children (subtrees) that must be visited first before
they are visited. First beginning from the root, if he has children then the
root is pushed in top of the stack, then, after that, his children are
attempted to be visited, starting from the first one, but if that one also has
children (subtrees), and the rule holds first the children and then the root,
then this one is also pushed on the top of the stack too, and then this
activity goes one, until we reach a node that have children leaves, they are
not pushed in the stack, just visited, and then the parent node of these nodes
is visited, which is on the top of the stack now, then he is poped out, and
then there is an attempt to visit the rest child nodes of the node that is now
on the top of the stack, that is the parent of the previously visited node. So
the same recursive procedure is executed on all the nodes in the tree (on the
top of stack), until the stack is not empty.
An
iterative version of preorder is implemented also by using stack. Only this
time, after every visited node (starting from the root) the children (subtrees)
must be visited, and this holds true for every next visited node. This means
that after visiting a node, all his children are not visited at once. There is
actually an attempt of applying the same recursive procedure on the his
children (subtrees), one by one. So after a visited root, the children are
attempted to be visited, but after a visited child, it turns out that, that
node has children too, and the rule still holds, after the root, the children
must be visited. The other children (subtrees) of the root will be visited
after is done with the first one. The node (starting from the root) that is
visited in the moment is not pushed in the stack. After visiting the root, all
his children are pushed in the stack. Then there is an attempt to visit them
all, starting from the one on top of the stack, that one is visited, but if it
was turned out that that one has children too, and according to the rule: after
a visited node, there must be a visit to his children, the node is after that
poped out of the stack, because its done with it (it was visited), and then his
children are pushed in the stack, and then the same recursive procedure is
applied to the node on the top of the stack. The procedure stops when the stack
gets empty.
A
stack must be used in the iterative traversals because, for postorder: to
remember the nodes in the right hierarhical order, because the visiting
(printing) starts from the bottom, from the leaves and then level by level, it
goes up in the tree hierarchy following the nodes (ordered chain from
ancestors) in the stack, and to remember the children that are not yet
traversed recursively.
For
the preorder, a stack is used to store the children in the right hierarhical
order, that are not yet traversed.
The
detailed explanation of the traversals above, practicaly explains the need for
stack.
The
recursive implementations use stack also to remember the positions of where the
procedures stoped, because of the recursive calls, so they can be continued
later, in right order, after the recursively called functions complete.
.
And
finally saying, if the nodes had also pointer to their parent, then there would
be no need for stack.
Here is an example of preorder and postorder traversal
of the tree on Picture 1:
When a node is visited, it is printed on the
screen, as an output of the traverse algorithm.
Preorder:
5, 4, 9, 10, 1, 2, 9, 1, 8.
Postorder:
8, 1, 9, 2, 1, 10, 9, 4, 5.
Common operation and uses of all kind of trees:
Common operations:
- Visiting all the items of a tree
- Searching for an item
- Adding a new item at a certain
position on the tree
- Deleting an item
- Removing a whole section of a tree
(called pruning)
- Adding a whole section to a tree
(called grafting)
- Finding the root for any node
Common uses:
- Manipulate hierarchical data
- Make information easy to search
- Manipulate sorted lists of data
Binary search trees
Binary search tree (BST) or a lexicographic tree
is a binary tree data structure which has the following binary search tree
properties :
- Each node has a value.
- The key value of the left child of a
node is less than to the parent's key value.
- The key
value of the right child of a node is greater than (or equal) to the
parent's key value.
- And
these properties holds true for every node in the tree.
If a BST allows duplicate values, then it
represents a multiset. This kind of tree uses non-strict inequalities (<=,
>=). Everything in the left subtree of a node is strictly less than the
value of the node, but everything in the right subtree is either greater than
or equal to the value of the node.
If a BST doesn't allow duplicate values,
then the tree represents a set with unique values, like the mathematical set.
Trees without duplicate values use strict inequalities, meaning that the left
subtree of a node only contains nodes with values that are less than the value
of the node, and the right subtree only contains values that are greater.
The choice of storing equal values in the
right subtree only is arbitrary; the left would work just as well. One can also
permit non-strict equality in both sides. This allows a tree containing many
duplicate values to be balanced better, but it makes searching more complex.
Minimum
and maximum
An element in a binary search tree whose key is a minimum can always be
found by following the left child pointers from the root until a NULL is
encountered. The maximal element can always be found by following the right
child pointers from the root until NULL is encountered. The binary-search-tree
property guarantees that these operations are correct.
Traversal
The binary
search tree property allows us to obtain all the keys in a binary search tree
in a sorted order by a simple traversing algorithm, called an inorder
tree walk, that traverses the
left subtree of the root in inorder traverse, then accessing the root node
itself, then traversing in inorder the right subtree of the root node. The tree
may also be traversed in preorder or postorder traversals. By first accessing
the root, and then the left and the right subtree or the right and then the
left subtree to be traversed in preorder. And the opposite for the postorder.
And of course the level order traversal can be applied to this tree.
Here is an inorder traverse implementation with
recursive function in C:
void inorder(nodep p){
if(p!=NULL){
inorder(p->left);
printf("%d
",p->info);
inorder(p->right);
}
}
The
iterative version of the inorder algorithm, uses stack also, and follows the
similar analogy as the iterative implementations of the preorder and postorder
algorithms.
Successor and predecessor
Given a node in a binary search tree, it is sometimes important to be
able to find its successor in sorted meaning of the word, which is determined
by an inorder tree traverse. If all keys are distinct, the successor of a node x is the node with the
smallest key greater than key[x]. The structure of a binary search tree allows us to
determine the successor of a node without ever comparing keys. The following
procedure returns the successor of a node x in a binary search
tree if it exists, and NULL if x has the largest key in the tree.
//x is the root, TREE-MINIMUM(y) returns the mostleft node
(smallest) of the tree named y.
TREE-SUCCESSOR(x)
1 if right[x] ≠ NULL
2 then
return TREE-MINIMUM (right[x])
3 y ← p[x]
4 while
y ≠ NULL and x = right[y]
5 do x ← y
6 y ← p[y]
7 return y
The code for TREE-SUCCESSOR is broken into two cases.
If the right subtree of node x is nonempty, then the
successor of x is just the leftmost node in
the right subtree, which is found in line 2 by calling TREE-MINIMUM(right[x]).
On the other hand, if the right subtree of node x
is empty and x has a successor y, then y
is the lowest ancestor of x whose left child is also an
ancestor of x. To find y,
we simply
go up the tree from x until we encounter a node
that is the left child of its parent; this is accomplished by lines 3–7 of
TREE-SUCCESSOR.
The running
time of TREE-SUCCESSOR on a tree of height h is O(h), since we either follow a path up the tree or
follow a path down the tree. The procedure TREE-PREDECESSOR, which is symmetric
to TREE-SUCCESSOR, also runs in time O(h). Even if keys are not distinct, we define the
successor and predecessor of any node x as
the node returned by calls made to TREE-SUCCESSOR(x) and TREE-PREDECESSOR(x), respectively.
Insertion
Insertion begins as a search would begin;
if the root is not equal to the value, we search the left or right subtrees as
before. Eventually, we will reach a leaf and add the value as its right or left
child, depending on the node's value. The Insert algorithm works like this: if
the tree is empty we insert the node as a root of the tree, if not, then we
examine the root value and recursively Insert the new node to the left subtree
if the new value is less than the root, or the right subtree if the new value
is greater than or equal to the root.
Here's how a typical binary search tree
insertion might be performed in C, this insertion algorithm maintains the
binary search tree properties in every next insertion, so the operations on the
BST (like binary searching) can produce always correct results:
int insert(nodep * n, info_t x){
nodep p = *n;
if(p==NULL){
*n=(nodep)malloc(sizeof(node));
(*n)->info=x;
(*n)->left=(*n)->right=NULL;
return(1);
}
else if(x< p->info)
return(insert(&(p->left),x));
else if(x> p->info)
return(insert(&(p->right),x));
else
return(0);
}
The complexity of this algorithm is O(h), h is
the height, and in average if the insertions of elements are in random series
the complexity is around O(lgn), where n is total the number of nodes in the
tree, and belongs to the set N.
O(h) is equivalent to O(lgn), because the height depends on the number of nodes (h=f(n)). We’ll talk about this in more details later.^{}
To
insert a node in an empty tree, we simply make that node as a root, no mater
what his value is. If we want to insert another node in order to build up the
tree, we must insert that node as a child to the previosly added root node. If
the key value of that node is less than the value of the root, then the node is
inserted left of the root, as his left child, symmetrical operation is
performed if the new node is greater than the root.
Picture 4: Inserting an item with key 13 into a binary search tree.
Lightly shaded nodes indicate the path from the root down to the position where
the item is inserted. The dashed line indicates the link in the tree that is
added to insert the item.
If we want to insert the items (7, 3, 10, 12,
2, 1, 13,) by that order, in a BST, by
using the above algorithm, we’ll get this tree:
Picture 5.
Searching
A binary search of a sorted array of n
elements is a fast algorithm for searching, with complexity O(lgn), but
the static size of the array can make the task sometimes uncomfortable. Linked
list can increase the complexity of the binary searching from O(lgn) to
the recursive comlexity function T(n) = n/2 + T(n/2), so linked lists are not
used in binary searching. That’s why a binary search tree can be used for BS.
There is no need for sorting, because the elements are already ordered in
sorted order, and the complexity of a binary searching on a BST is averagely O(lgn),
where n is the number of nodes.
Searching a binary tree for a specific
value is a process that can be performed recursively because of the order in
which values are stored. We begin by examining the root. If the value we are
searching for equals the root, the value exists in the tree. If it is less than
the root, then it must be in the left subtree, so we recursively search the
left subtree in the same manner. Similarly, if it is greater than the root,
then it must be in the right subtree, so we recursively search the right
subtree. If we reach a leaf and have not found the value, then the item is not
where it would be if it were present, so it is not contained in the tree at
all. A comparison may be made with binary search, which operates in nearly the
same way but using random access on an array instead of following links.
This operation requires O(h) (h
is the height of the root). In average cases when the items of the tree are
inserted in random (not sorted) series, the height is around lgn (n –
number of items) and the operation requires O(lgn) time. But needs
O(n) time in the worst-case, when the unbalanced tree resembles a linked
list, that is, when the items are sorted while they are inserted.
Here is recursive implementation of binary
searching in a BST, written in pseudocode:
//x is the root node of the tree, k is the key value that is
in the interest of searching for.
TREE-SEARCH (x, k)
if x= NULL or k = key[x]
then return x
if k < key[x]
then return TREE-SEARCH(left[x], k)
else return TREE-SEARCH(right[x], k)
The search and insertion algorithms in every
next iteration go one level lower, thus getting closer to the level which is
equal to the height, and that’s why they have complexity of O(h).
Deletion:
There are several cases to be considered:
- Deleting
a leaf: Deleting a node with no children is easy, as we can simply remove
it from the tree. The node is deallocated and the pointer from the parent
node, that pointed to the deleted node, will now have to get value NULL.
- Deleting
a node with one child: Delete it and replace the value of the parent
pointer with the address of the child node of the deleted node.
- Deleting
a node with two children: A node with two children cannot be
deleted easy as a node with one child. We well call this node D.
Its obvious that the two children after the deletion of their parent
cannot be attach with the parent of their parent. The correct solution of
the problem is the following:
D is the smallest in respect of all the nodes in his right subtree. And also the most-left node in the right subtree of D is for sure the smallest node of all nodes in the right subtree of D, and is also greater than all the nodes in the left subtree of D, we will call this node N. So to keep the binarity and the lexicographical order of the tree we can remove the N node from his place as we remove a node with one or zero children and make the N node take the place of D, the pointers of N can take the values of the pointers of D, so N can point to the left and right subtrees of D, and then D can be deallocated. The same, but opposite analogy can be applied with the left subtree of D.
The node N is the succesor of D in inorder traversal of the tree. The symmetcal node in the left subtree is the inorder predecessor of D.
Threaded
binary trees
There
are a lot of NULL pointers in the binary trees, these pointers exists, but they
don’t point anywhere. They can be used in that maner, that they introduce a
special kind of pointers, a so called threads. The threads are pointers that
point on the predecessor, that is, on the successor of a given node in some of
the traversal methods. In this case, the tread points to the predecessor if it
is a empty left link of the node, if the thread is unused right link of the
node, then it points to the successor in the chosen notation (traversal).
This
way the binary trees can be threaded in preorder, postorder, and inorder notation.
Here
is an example of inorder threaded binary tree:
Picture 6.
It can be noticed from the picture that the
first and last node in the chosen traversal do not point at any node from the
tree, while all other links in the tree are used. This way the procedures for
traversing the tree are simplified.
Because
of simpler programming implementation of the traversing of the tree (to know
when to stop), and the two unused links to be used, it can be introduced a
guide node, that will point at the root of the tree, and at him the two unused
links will point.
The
second problem that has to be solved after the introducting of the treads is
the way of which the program that works with the tree will make difference from
the moving across a real link or across a tread, because they both are realized
with the same pointer. The most simple way is to be added an additional
variable for each of the two pointers. These variables will tell if the
appropriate pointer is a link or a tread. In this case except for the info,
llink and rlink, the node will contain two additional fields that will be as
flags (labels) (commonly designated with ltag and rtag, that is, lbit and
rbit). Its commonly taken that if rbit is 1, then the right link is a real
link, while if its 0, then the right link is a tread. The same implies for the
lbit.
Here is a typical representation of the treaded
binary tree in C:
typedef int info_t;
struct node{
info_t
info;
struct node *llink, *rlink;
int lbit, rbit;
};
typedef struct node * nodep;
If the binary tree is empty then the guide node
will point to himself.
Here below is a programming code in C
for adding node in a binary tree that is inorder treaded:
int addNode(node * p, int n, char where)
{
node * temp;
if(‘l’ == where){
if(p->ltag){
printf(“There already exists a left subtree\n”);
return (-1);
}
else{
temp=(node *)malloc(sizeof(node));
temp->info=n;
temp->rtag=temp->ltag=0;
temp->llink=p->llink;
temp->rlink=p;
p->llink=temp;
p->ltag=1;
return 0;
}
}
else if(‘r’==where){
if(p->rtag){
printf(“There already exists a right subtree\n”);
return(-1);
}
else{
temp=(node *)malloc(sizeof(node));
temp->info=n;
temp->rtag=temp->ltag=0;
temp->rlink=p->rlink;
temp->llink=p;
p->rlink=temp;
p->rtag=1;
return 0;
}
}
}
It is possible to discover the parent of a
node from a inorder threaded binary tree, without explicit use of parent
pointers or a stack, albeit slowly. This can be useful however where stack
space is limited, or where a stack of parent pointers is unavailable.
This is possible, because if a node (k
) has a right child (m
) then m
's left pointer must be either a child, or a
thread back to k
. In the case of
a left child, that left child must also have a left child or a thread back to k
, and so we can follow m
's left children until we find a thread,
pointing back to k
. The situation
is similar for when m
is the left child of k
, except in this
case we have to follow the right children.
Similar analisys can be made for finding all the possible
cases in which the discovering of the parents in preorder, and postorder
treaded binary trees is possible.
Balanced binary trees
What is a complete tree:
The maximal number of nodes that can be found on the highest level (level
0) is one – 2^{0}, the root. The maximal number of nodes on level 1 are
two – 2^{1}, on level 2 – four – 2^{2}, generaly on level m,
there are 2^{m} number of nodes. So the maximal possible number of
nodes in a tree with height h is:
2^{0} + 2^{1} + 2
^{2}
+ ... + 2^{h} = 2^{h+1 }– 1
If the tree have this property, then all unterminating
nodes have two children, and this tree is then called a complete tree and then
always the height h will be approximately lgn. So if the height
of a tree is [lgn], and the number of nodes is n, then the tree
is maximaly filled with the most minimal possible height.
The completed tree has the property that all leaves are in the same
level.
In every complete tree the
number of nodes in the left subtree of the root (or every node) is equal to the
number of nodes in the right subtree. The algorithms for binary searching, insertion,
finding predecessor, successor can have the nice O(lgn) worst case of
running time when the tree is complete. Every subtree in a complete tree is
also a complete tree.
Also this tree has the property of being balanced. A
balanced tree is every tree in which the difference of the heights between the
two subtrees of any node in the tree, is zero or at most one. The tree on Picture
6, a), is a balanced tree. But the tree on b) is not, the
subtrees of the node 2 are very unbalanced, on one side we have an empty,
totaly unexploited left subtree, and on the other side, we have a right
subtree, that spreads to height five. The subtrees of the nodes 3 and 10, on Picture
5 are also unbalanced, which is obvious.
The tree on b) has 6 nodes, but the height is 5,
and lg6 is around 2.59, so the search alg. will not be optimal. To make it
optimal we have to transform the tree to a balanced bin. tree with the same
items: see a). In that case the search alg. will have compx. O(lgn).
The tree is nearly completed that’s why that 2.59, but it is balanced.
a)
b) Picture 7. |
Also
with a balanced tree, in every next iteration of the binary search algorithm,
around 1/2 of the all the nodes are skiped (taken out of consideration), and
there are only 1/2 nodes left over to be searched, because one of the two
subtrees of a root node is taken out of consideration in every next iteration.
The iteration stopes when there is only 1 node left in consideration. From this
we can also conclude why the binary searching of a balanced tree is always
around O(lgn), and the text below in more details explains this:
The
property of a balanced binary tree compulses the need of every node (begining
from the root) that has a left (or right) subtree that has height greater than
1, to have an opposite unempty subtree that will have height as (or one level
less) of the height of the opposite subtree. And this recursively holds true
for every other lower level node. This also implies the two subtrees of every
node to share nearly the same number of nodes. Everything above said also
implies that the balanced binary tree is always nearly completed. That’s why
the binary searching with balanced tree lasts in the worst case around O(lgn).
To
proof that every BST can be made balanced we’ll say the following:
The n items in every BST can be permuted in
increasing sorted order. If the median of the array of the sorted items is made
the root of the tree, then for the left and right subtrees of the root there
are left over almost equal number of nodes. For the left subtree we have the
elements from 0, ...,n/2 – 1, and for the right subtree the elements from n/2 +
1, ...,n. If we take the median items of these two arrays as the roots of the
left and right subtrees of the root, that is, left and right children of the
root (we will name these nodes as R_{l} and R_{r}),
then for the left and right subtrees of R_{l} and R_{r}
we have of course also equal number of nodes in sorted order. We’ll continue
this pattern until we don’t insert all the nodes. This is how we can get a nearly
completed tree – a balanced tree from every binary search tree, because we
always trie to fill the two children of all nodes of the current level, and
then go to the next level.
To
implement this procedure in a programming language, we need to use some
additional dinamically allocated structure (binary tree or a linked list) for
saving the sorted array of nodes from the binary tree. We can express the
complexity of this procedure by the recursive relation T(n)=n/2+2T(n/2).
Binary search trees that use algorithms for
balancing are called self-balancing binary search trees.
AVL
tree
AVL tree is a self-balancing binary search
tree, and the first such data structure to be invented. The AVL algorithm is a
dynamic method of balancing binary search trees. The AVL tree is named after its two
inventors, Georgii Adel’son-Velsky and Yevgeniy Landis, who published it in
their 1962 paper "An algorithm for the organization of information."
In an AVL tree the heights of the two child subtrees of any node differ by at
most one, therefore it is also called height-balanced. Additions and deletions
of tree items may require the tree to be rebalanced by one or more tree
rotations.
The balance factor of a node is the height of its
right subtree minus the height of its left subtree. A node with balance factor
1, 0, or -1 is considered balanced. A node with any other balance factor is
considered unbalanced and requires rebalancing the tree. The balance factor is
either stored directly at each node or computed from the heights of the
subtrees.
This is a function in C for getting the height of
some node:
/*level starts from –1*/
int h(node * root, int level){
if(root == NULL)
return level+1;
else
return max(h(root->left,level+1),h(root->right,level+1));
}
AVL trees are often compared with
red-black trees because red-black trees also take O(lg n) time for the
basic operations. AVL trees perform better than red-black trees for
lookup-intensive applications. The AVL tree balancing algorithm appears in many
computer science curricula.
The AVL tree ensures at most (3
/ 2) * lg(n) worst case scenario of binary search trees.
The idea behind maintaining the balanceness of an AVL tree is that whenever we insert or delete an item, we have disturbed the balanceness of the tree, and then we must restore it by performing a set of manipulations (called rotations) on the tree. These rotations come in two flavors: single rotations and double rotations. And each flavor has its corresponding left and right versions, these versions are symmetric to each other.
APPENDIX
Parsing
binary trees
Binary
trees are used in computer programs for calculating arithmetical operations.
They are used to parse, in right order, an arithmetical expressions, so the
correct result in correct order can be obtained.
Before
we explain the use of binary tree, we’ll say the following:
Every arithmetical expression is consisted from
binary operations, every binary operation is consisted from two operands. The
operands belong to some of the set of numbers: N, R, C...
We can define a general model for a binary
operation with the following function:
let OpN belongs to one of the number sets: N,
R, C, ...
OpN(Operand1, Operand2, Operation) = Operand1
Operation Operand2;
Where Operation belongs to the set S={+,-,*,/,...},
and Operand1 and Operand2 belongs to one of the number sets (N, R, C,...).
If Operation= ’+’ then: OpN(Operand1, Operand2,
+) = Operand1 + Operand2;
If Operation= ’–’ then: OpN(Operand1, Operand2,
–) = Operand1 – Operand2;
And etc...
Because
OpN belongs to the same set of numbers as the two operands we can use this
model to define a general recursive model for any arithmetical expresion. Every
arithmetical expression can be performed recursively.
Here is the model:
OpN(OpN_{left}, OpN_{right},
operation) = OpN_{left} operation OpN_{right}
OpN_{left}
and OpN_{right} are also binary operations that belong to the same
number set as their operands.
So
it’s actually the same function represented recursively. This function must
have terminating conditions, the function will be terminating if some of the
OpN’s is C, where C is a constant number from the same set as the operands. If
we properly define all the suboperations of an operation then we can preserve
the priorities of executing the binary operations.
Let’s represent the following arithmetical
expression:
(4+8)*3/12 + 5 – (3+2)
Here is the hierarhical order of the
calculation of the expression:
First can be calculated 4+8, or 3/12, or 3+2
If we choose to calculate first 4+8 then we get
this 12*3/12 + 5 – (3+2), now we can calculate 12*3, or 3/12 or 3+2, if we
choose 3+2 then we get 12*3/12 + 5 – 5 => 36/12 + 0 => 3 + 0.
So OpNleft = 3 = 36/12, and OpNright = 0 = 5 –
5
OpNleftleft = 36 = 12*3, and 12 = OpNleftright
= 12 (this is a terminal).
We can continue like this until we don’t reach
all the terminals...
Because different members of an arithmetical
expression have equal privileges of performing, in the same moment, we can
choose different paths for calculating the final result of an expression, so
the recursive functional model can be different in different cases.
We can graphicaly represent this arithmetical
expression by using the recursive model as this following:
Picture 8.
So 4, 8, 3, 12, 3, 2 will be terminating
characters.
So
this is actually a binary tree. A binary tree is also a recursive structure.
Actually, intuitively we can conclude that a binary tree can be used to
represent arithmetical expressions.
As
in the functional model so also with the binary tree we can notice that every
operation (except for the highest) is an operand for some higher operation (his
parent node).
A
binary tree can be in fact defined recursively as the functional model above,
by using structural induction as the following:
If a Binary Tree is not empty, terminating (and
it could be) then: A Binary Tree is consisted from a root node, and two Binary
(sub)Trees: left and right Binary (sub)Trees. This is a rule, tautology.
The
binary tree is actually the functional model, they are both the same thing,
just visualized differently. For example the operation 5 + 7 can be represented
with a binary tree as this one:
Picture 9.
By
the help of the preorder, postorder or inorder traversal we can calculate the
arithmetical operation in the proper hierarhical way and obtain the correct
result.
For
example when we use postorder: after a visit of two terminating nodes, these
nodes’s values are pushed in a stack, after that their parent (operation) node
is visited. When it is visited, his operation is performed on the two values
that are in the top of the stack right now, after this, the two values are
poped from the stack and the result is pushed on the top of the stack, so one
of the operands of the parent operation of the operation is right now obtained,
the operation is transformed in a operand for the higher operation, if we
continue with this same procedure for every next visited node, on the end we
will obtain the correct result of the arithmetical expression on the top of
stack, and the stack will have no other elements.
By
using a relatively simple iterative, or recursive algorithm, that uses stack
structure for preserving the hierarchy in the parsing of the expression, we can
create the proper binary tree for any given arithmetical expression.
To
say recursive or iterative algorithm is basically the same, they are both
inductive functional patterns, as the binary tree.
Permutation
trees
Trees can also be
used to represent permutations, without repetition, and with repetitions. Not
only that they are used to represent permutations, their tree traversals can be
used for constructing algorithms that generate all permutations (with or
without repetitions) for a given set of elements, only once for each different
permutation. In a traversal (preorder, postorder, inorder) of a permutation
tree, every time when leaf is reached, the leaf together with the path of nodes
from that leaf to the root is printed, and that’s how all the permutations are
generated.
Its more easy to generate in fly the permutations (and put them in some
linear structures) while the process of building the permutation tree is going
on.
Here is an
example of a permutation without repetition tree for the set: 1, 2, 3, 4:
Picture 10.
A similar (it could be an infinite) tree can be made for
permutations with repetitions.