Free Web Hosting Provider - Web Hosting - E-commerce - High Speed Internet - Free Web Page
 Search the Web

# Tree data structures

## Tree data structures

a tree is a widely-used data structure that emulates a tree structure with a set of linked nodes

#### Arbitrary trees

Tree is a widely-used data structure that emulates a tree structure with a set of linked nodes. The tree graphicaly is represented most commonly as on Picture 1. The circles are the nodes and the edges are the links between them.

Trees are usualy used to store and represent data in some hierarhical order. The data are stored in the nodes, from which the tree is consisted of.

A node may contain a value or a condition or represent a separate data structure or a tree of its own. Each node in a tree has zero or more child nodes, which are one level lower in the tree hierarchy (by convention, trees grow down, not up as they do in nature). A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent. A node that has no childs is called a leaf, and that node is of course at the bottommost level of the tree. The height of a node is the length of the longest path to a leaf from that node. The height of the root is the height of the tree. In other words, the "height" of tree is the "number of levels" in the tree. Or more formaly, the height of a tree is defined as follows:

1. The height of a tree with no elements is 0
2. The height of a tree with 1 element is 1
3. The height of a tree with > 1 element is equal to 1 + the height of its tallest subtree.

The depth of a node is the length of the path to its root (i.e., its root path). Every child node is always one level lower than his parent.

The topmost node in a tree is called the root node. Being the topmost node, the root node will not have parents. It is the node at which operations on the tree commonly begin (although some algorithms begin with the leaf nodes and work up ending at the root). All other nodes can be reached from it by following edges or links. (In the formal definition, a path from a root to a node, for each different node is always unique). In diagrams, it is typically drawn at the top.

In some trees, such as heaps, the root node has special properties.

A subtree is a portion of a tree data structure that can be viewed as a complete tree in itself. Any node in a tree T, together with all the nodes below his height, that are reachable from the node, comprise a subtree of T. The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree (in analogy to the term proper subset).

Every node in a tree can be seen as the root node of the subtree rooted at that node.

Picture 1. An example of a tree

An internal node or inner node is any node of a tree that has child nodes and is thus not a leaf node.

There are two basic types of trees. In an unordered tree, a tree is a tree in a purely structural sense — that is to say, given a node, there is no order for the children of that node. A tree on which an order is imposed — for example, by assigning different natural numbers to each child of each node — is called an ordered tree, and data structures built on them are called ordered tree data structures. Ordered trees are by far the most common form of tree data structure. Binary search trees are one kind of ordered tree.

Representations

There are many different ways to represent trees; common representations represent the nodes as records dynamically allocated on the heap memory with pointers to their children, their parents, or both, and the childen will point to their children and etc., and this repeats recursevly until a leaf is reached or as items in an array, with relationships between them determined by their positions in the array (e.g., binary heap).

Here below is an example of the declaration of the structure of a tree node written in C, in which the node will be limited to have mostly two children (in this case a left and right child). This kind of tree is called binary tree:

typedef type_of_data_in_the_node info_t;

struct node{

info_t info;

};

typedef node * nodep;

If a node does not have a left or a right child or both, then the left or right link or both should have value NULL. In this case this value of the pointer tells that the node does not have a left or right child node. The structure of the node can have also a pointer to his parent.

In languages like FORTRAN, that do not support records or pointers. Pointers can be implemented by means of arrays of subscripts, for example, ILB(K) would be the subscript for the left child to node K, and IRB(K) would be the subscript for the right child. K is an integer number equal to the index of the array for the nodes values, and corresponds to the pointers of the addresses of the nodes.

Binary trees are not ordinary trees, they are special type of trees that have some specific characteristics and uses that do not apply to the ordinary trees.

For example: the existence, and the difference between a left and a right branch (link) and child of a node and not simply branches and childs, the use in binary searching, parsing math formulas in program languages, and etc.

Unlike the binary trees, there are structural uncommodities when representing customary (ordinary) in structural meaning trees in which each node has arbitrary number of children nodes, as the tree on the Picture 1. One node in that kind of tree would be free to have zero, one, two, hundred, thousand or what ever number of child nodes. This makes problems in the declaration of the node structure in the program languages, and the limited memory of the computer. It is not a good decision to define some maximal number of links (or pointers) to the children in the declaration of the node structure, because of the limits, and memory waste.

In this case, a the structure of a binary tree can be used to represent a customary tree, where there will be no need of wasting large memory for declarating a maximal number of links for the children. The nodes will be dynamically allocated, and only two pointers for each node will be used.

The picture below explains this idea pretty good:

Picture 2:

>

The left link of some node N will tell if that node has children nodes. If the left link (pointer) of N is NULL then this tells that the N has no children nodes. In the other case the llink pointer will point to one of the children. The node that is reached by the llink pointer is the one of the children nodes of N, and then if we recursively follow the right links of this node, we can visit all the rest of child nodes of N until rlink is not NULL. The representation and traversing of the children nodes of a parent node, resembles a linked linear list. From the root down to the leaves we can recursively repeat this procedure for every node. A logical conclusion is that rlink in this representation points to the siblings of a node.

Usual graph representations can also be used to represent arbitrary tree structure.

In graph theory, a tree is a connected acyclic graph (or sometimes, a connected directed acyclic graph). A rooted tree is such a graph with a vertex singled out as the root. In this case, any two vertices connected by an edge inherit a parent-child relationship. An acyclic graph with multiple connected components or a set of rooted trees is sometimes called a forest.

Traversals

Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a walk of the tree (traverse). Often, an operation might be performed when a pointer arrives at a particular node (visiting the node – for example, printing the value/s that the node contains). A walk in which each parent node is traversed before his children (or his subtrees) is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed, is called a postorder walk. There is also a level order traversal. In this traversal, until all nodes of one level (beginning from the zero level – the root) are not traversed, the traversing of the nodes on the next level is not started. Every tree traversal is of O(n) complexity, where n is the number of nodes in a tree.

Here are the more formal definitions of the preorder and postorder walks of a arbitrary tree: Let T be the tree, R the root. Let we have the structure of the tree, whose root is R, and it contains the subtrees T1, T2, ..., Tn. Then:

Preorder traverse of the nodes of T is an array, that contains the information from the root R, by which are followed the information from the nodes of the subtree T1 traversed in preorder, and then the information from the nodes of the subtree T2 traversed in preorder, ..., then T3 in preorder, T4, ..., to Tn in preorder..

Postorder traverse of the nodes of T is an array that contains the arrays of information from the nodes of the subtrees T1, T2, ..., Tn, traversed in postorder, starting from T1, T2, ..., to Tn, and on the end the information from the root R.

Here are recursive realizations of the preorder and postorder algorithms in C for traversing a customary (arbitrary) tree represented by a binary tree, and it's the same for preorder and postorder traversal of a binary tree:

Because every tree can be represented with a binary tree, the traversals for binary trees can be used for traversing arbitrary tree structure, if the tree is represented by a binary tree.

When we traverse a tree in postorder every child (subtree) must first be visited before his parent (root) is visited, and a node cannot be visited, if previously his children (subtrees) are not visited (printed on the screen). If we want to implement the postorder traversal iteratively, we need to use stack, that will store the chain of ancestor nodes, previously encountered but not actually visited (printed on the screen), because they all have children (subtrees) that must be visited first before they are visited. First beginning from the root, if he has children then the root is pushed in top of the stack, then, after that, his children are attempted to be visited, starting from the first one, but if that one also has children (subtrees), and the rule holds first the children and then the root, then this one is also pushed on the top of the stack too, and then this activity goes one, until we reach a node that have children leaves, they are not pushed in the stack, just visited, and then the parent node of these nodes is visited, which is on the top of the stack now, then he is poped out, and then there is an attempt to visit the rest child nodes of the node that is now on the top of the stack, that is the parent of the previously visited node. So the same recursive procedure is executed on all the nodes in the tree (on the top of stack), until the stack is not empty.

An iterative version of preorder is implemented also by using stack. Only this time, after every visited node (starting from the root) the children (subtrees) must be visited, and this holds true for every next visited node. This means that after visiting a node, all his children are not visited at once. There is actually an attempt of applying the same recursive procedure on the his children (subtrees), one by one. So after a visited root, the children are attempted to be visited, but after a visited child, it turns out that, that node has children too, and the rule still holds, after the root, the children must be visited. The other children (subtrees) of the root will be visited after is done with the first one. The node (starting from the root) that is visited in the moment is not pushed in the stack. After visiting the root, all his children are pushed in the stack. Then there is an attempt to visit them all, starting from the one on top of the stack, that one is visited, but if it was turned out that that one has children too, and according to the rule: after a visited node, there must be a visit to his children, the node is after that poped out of the stack, because its done with it (it was visited), and then his children are pushed in the stack, and then the same recursive procedure is applied to the node on the top of the stack. The procedure stops when the stack gets empty.

A stack must be used in the iterative traversals because, for postorder: to remember the nodes in the right hierarhical order, because the visiting (printing) starts from the bottom, from the leaves and then level by level, it goes up in the tree hierarchy following the nodes (ordered chain from ancestors) in the stack, and to remember the children that are not yet traversed recursively.

For the preorder, a stack is used to store the children in the right hierarhical order, that are not yet traversed.

The detailed explanation of the traversals above, practicaly explains the need for stack.

The recursive implementations use stack also to remember the positions of where the procedures stoped, because of the recursive calls, so they can be continued later, in right order, after the recursively called functions complete.

.

And finally saying, if the nodes had also pointer to their parent, then there would be no need for stack.

Here is an example of preorder and postorder traversal of the tree on Picture 1:

When a node is visited, it is printed on the screen, as an output of the traverse algorithm.

Preorder:

5, 4, 9, 10, 1, 2, 9, 1, 8.

Postorder:

8, 1, 9, 2, 1, 10, 9, 4, 5.

Common operation and uses of all kind of trees:

## Common operations:

• Visiting all the items of a tree
• Searching for an item
• Adding a new item at a certain position on the tree
• Deleting an item
• Removing a whole section of a tree (called pruning)
• Adding a whole section to a tree (called grafting)
• Finding the root for any node

## Common uses:

• Manipulate hierarchical data
• Make information easy to search
• Manipulate sorted lists of data

##### Binary search trees

Binary search tree (BST) or a lexicographic tree is a binary tree data structure which has the following binary search tree properties :

• Each node has a value.
• The key value of the left child of a node is less than to the parent's key value.
• The key value of the right child of a node is greater than (or equal) to the parent's key value.
• And these properties holds true for every node in the tree.

If a BST allows duplicate values, then it represents a multiset. This kind of tree uses non-strict inequalities (<=, >=). Everything in the left subtree of a node is strictly less than the value of the node, but everything in the right subtree is either greater than or equal to the value of the node.

If a BST doesn't allow duplicate values, then the tree represents a set with unique values, like the mathematical set. Trees without duplicate values use strict inequalities, meaning that the left subtree of a node only contains nodes with values that are less than the value of the node, and the right subtree only contains values that are greater.

The choice of storing equal values in the right subtree only is arbitrary; the left would work just as well. One can also permit non-strict equality in both sides. This allows a tree containing many duplicate values to be balanced better, but it makes searching more complex.

Minimum and maximum

An element in a binary search tree whose key is a minimum can always be found by following the left child pointers from the root until a NULL is encountered. The maximal element can always be found by following the right child pointers from the root until NULL is encountered. The binary-search-tree property guarantees that these operations are correct.

### Traversal

The binary search tree property allows us to obtain all the keys in a binary search tree in a sorted order by a simple traversing algorithm, called an inorder tree walk, that traverses the left subtree of the root in inorder traverse, then accessing the root node itself, then traversing in inorder the right subtree of the root node. The tree may also be traversed in preorder or postorder traversals. By first accessing the root, and then the left and the right subtree or the right and then the left subtree to be traversed in preorder. And the opposite for the postorder. And of course the level order traversal can be applied to this tree.

Here is an inorder traverse implementation with recursive function in C:

void inorder(nodep p){

if(p!=NULL){

inorder(p->left);

printf("%d ",p->info);

inorder(p->right);

}

}

The iterative version of the inorder algorithm, uses stack also, and follows the similar analogy as the iterative implementations of the preorder and postorder algorithms.

Successor and predecessor

Given a node in a binary search tree, it is sometimes important to be able to find its successor in sorted meaning of the word, which is determined by an inorder tree traverse. If all keys are distinct, the successor of a node x is the node with the smallest key greater than key[x]. The structure of a binary search tree allows us to determine the successor of a node without ever comparing keys. The following procedure returns the successor of a node x in a binary search tree if it exists, and NULL if x has the largest key in the tree.

//x is the root, TREE-MINIMUM(y) returns the mostleft node (smallest) of the tree named y.

TREE-SUCCESSOR(x)

1 if right[x] ≠ NULL

2 then return TREE-MINIMUM (right[x])

3 y p[x]

4 while y ≠ NULL and x = right[y]

5 do x y

6 y p[y]

7 return y

The code for TREE-SUCCESSOR is broken into two cases. If the right subtree of node x is nonempty, then the successor of x is just the leftmost node in the right subtree, which is found in line 2 by calling TREE-MINIMUM(right[x]).

On the other hand, if the right subtree of node x is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. To find y,

we simply go up the tree from x until we encounter a node that is the left child of its parent; this is accomplished by lines 3–7 of TREE-SUCCESSOR.

The running time of TREE-SUCCESSOR on a tree of height h is O(h), since we either follow a path up the tree or follow a path down the tree. The procedure TREE-PREDECESSOR, which is symmetric to TREE-SUCCESSOR, also runs in time O(h). Even if keys are not distinct, we define the successor and predecessor of any node x as the node returned by calls made to TREE-SUCCESSOR(x) and TREE-PREDECESSOR(x), respectively.

### Insertion

Insertion begins as a search would begin; if the root is not equal to the value, we search the left or right subtrees as before. Eventually, we will reach a leaf and add the value as its right or left child, depending on the node's value. The Insert algorithm works like this: if the tree is empty we insert the node as a root of the tree, if not, then we examine the root value and recursively Insert the new node to the left subtree if the new value is less than the root, or the right subtree if the new value is greater than or equal to the root.

Here's how a typical binary search tree insertion might be performed in C, this insertion algorithm maintains the binary search tree properties in every next insertion, so the operations on the BST (like binary searching) can produce always correct results:

int insert(nodep * n, info_t x){

nodep p = *n;

if(p==NULL){

*n=(nodep)malloc(sizeof(node));

(*n)->info=x;

(*n)->left=(*n)->right=NULL;

return(1);

}

else if(x< p->info)

return(insert(&(p->left),x));

else if(x> p->info)

return(insert(&(p->right),x));

else

return(0);

}

The complexity of this algorithm is O(h), h is the height, and in average if the insertions of elements are in random series the complexity is around O(lgn), where n is total the number of nodes in the tree, and belongs to the set N.

O(h) is equivalent to O(lgn), because the height depends on the number of nodes (h=f(n)). We’ll talk about this in more details later.

To insert a node in an empty tree, we simply make that node as a root, no mater what his value is. If we want to insert another node in order to build up the tree, we must insert that node as a child to the previosly added root node. If the key value of that node is less than the value of the root, then the node is inserted left of the root, as his left child, symmetrical operation is performed if the new node is greater than the root.

Picture 4: Inserting an item with key 13 into a binary search tree. Lightly shaded nodes indicate the path from the root down to the position where the item is inserted. The dashed line indicates the link in the tree that is added to insert the item.

If we want to insert the items (7, 3, 10, 12, 2, 1, 13,) by that order, in a BST, by using the above algorithm, we’ll get this tree:

Picture 5.

### Searching

A binary search of a sorted array of n elements is a fast algorithm for searching, with complexity O(lgn), but the static size of the array can make the task sometimes uncomfortable. Linked list can increase the complexity of the binary searching from O(lgn) to the recursive comlexity function T(n) = n/2 + T(n/2), so linked lists are not used in binary searching. That’s why a binary search tree can be used for BS. There is no need for sorting, because the elements are already ordered in sorted order, and the complexity of a binary searching on a BST is averagely O(lgn), where n is the number of nodes.

Searching a binary tree for a specific value is a process that can be performed recursively because of the order in which values are stored. We begin by examining the root. If the value we are searching for equals the root, the value exists in the tree. If it is less than the root, then it must be in the left subtree, so we recursively search the left subtree in the same manner. Similarly, if it is greater than the root, then it must be in the right subtree, so we recursively search the right subtree. If we reach a leaf and have not found the value, then the item is not where it would be if it were present, so it is not contained in the tree at all. A comparison may be made with binary search, which operates in nearly the same way but using random access on an array instead of following links.

This operation requires O(h) (h is the height of the root). In average cases when the items of the tree are inserted in random (not sorted) series, the height is around lgn (n – number of items) and the operation requires O(lgn) time. But needs O(n) time in the worst-case, when the unbalanced tree resembles a linked list, that is, when the items are sorted while they are inserted.

Here is recursive implementation of binary searching in a BST, written in pseudocode:

//x is the root node of the tree, k is the key value that is in the interest of searching for.

TREE-SEARCH (x, k)

if x= NULL or k = key[x]

then return x

if k < key[x]

then return TREE-SEARCH(left[x], k)

else return TREE-SEARCH(right[x], k)

The search and insertion algorithms in every next iteration go one level lower, thus getting closer to the level which is equal to the height, and that’s why they have complexity of O(h).

### Deletion:

There are several cases to be considered:

• Deleting a leaf: Deleting a node with no children is easy, as we can simply remove it from the tree. The node is deallocated and the pointer from the parent node, that pointed to the deleted node, will now have to get value NULL.
• Deleting a node with one child: Delete it and replace the value of the parent pointer with the address of the child node of the deleted node.
• Deleting a node with two children: A node with two children cannot be deleted easy as a node with one child. We well call this node D. Its obvious that the two children after the deletion of their parent cannot be attach with the parent of their parent. The correct solution of the problem is the following:

D is the smallest in respect of all the nodes in his right subtree. And also the most-left node in the right subtree of D is for sure the smallest node of all nodes in the right subtree of D, and is also greater than all the nodes in the left subtree of D, we will call this node N. So to keep the binarity and the lexicographical order of the tree we can remove the N node from his place as we remove a node with one or zero children and make the N node take the place of D, the pointers of N can take the values of the pointers of D, so N can point to the left and right subtrees of D, and then D can be deallocated. The same, but opposite analogy can be applied with the left subtree of D.

The node N is the succesor of D in inorder traversal of the tree. The symmetcal node in the left subtree is the inorder predecessor of D.

There are a lot of NULL pointers in the binary trees, these pointers exists, but they don’t point anywhere. They can be used in that maner, that they introduce a special kind of pointers, a so called threads. The threads are pointers that point on the predecessor, that is, on the successor of a given node in some of the traversal methods. In this case, the tread points to the predecessor if it is a empty left link of the node, if the thread is unused right link of the node, then it points to the successor in the chosen notation (traversal).

This way the binary trees can be threaded in preorder, postorder, and inorder notation.

Here is an example of inorder threaded binary tree:

Picture 6.

It can be noticed from the picture that the first and last node in the chosen traversal do not point at any node from the tree, while all other links in the tree are used. This way the procedures for traversing the tree are simplified.

Because of simpler programming implementation of the traversing of the tree (to know when to stop), and the two unused links to be used, it can be introduced a guide node, that will point at the root of the tree, and at him the two unused links will point.

Here is a typical representation of the treaded binary tree in C:

typedef int info_t;

struct node{

info_t info;

int lbit, rbit;

};

typedef struct node * nodep;

If the binary tree is empty then the guide node will point to himself.

Here below is a programming code in C for adding node in a binary tree that is inorder treaded:

int addNode(node * p, int n, char where)

{

node * temp;

if(‘l’ == where){

if(p->ltag){

printf(“There already exists a left subtree\n”);

return (-1);

}

else{

temp=(node *)malloc(sizeof(node));

temp->info=n;

temp->rtag=temp->ltag=0;

p->ltag=1;

return 0;

}

}

else if(‘r’==where){

if(p->rtag){

printf(“There already exists a right subtree\n”);

return(-1);

}

else{

temp=(node *)malloc(sizeof(node));

temp->info=n;

temp->rtag=temp->ltag=0;

p->rtag=1;

return 0;

}

}

}

It is possible to discover the parent of a node from a inorder threaded binary tree, without explicit use of parent pointers or a stack, albeit slowly. This can be useful however where stack space is limited, or where a stack of parent pointers is unavailable.

This is possible, because if a node (`k`) has a right child (`m`) then `m`'s left pointer must be either a child, or a thread back to `k`. In the case of a left child, that left child must also have a left child or a thread back to `k`, and so we can follow `m`'s left children until we find a thread, pointing back to `k`. The situation is similar for when `m` is the left child of `k````, except in this case we have to follow the right children.```

```Similar analisys can be made for finding all the possible cases in which the discovering of the parents in preorder, and postorder treaded binary trees is possible.```

` `

`Balanced binary trees`

` `

` What is a complete tree:`

```The maximal number of nodes that can be found on the highest level (level 0) is one – 20, the root. The maximal number of nodes on level 1 are two – 21, on level 2 – four – 22, generaly on level m, there are 2m number of nodes. So the maximal possible number of nodes in a tree with height h is:```

` `

`20 + 21 + 2``2`` + ... + 2h = 2h+1 – 1`

` `

```If the tree have this property, then all unterminating nodes have two children, and this tree is then called a complete tree and then always the height h will be approximately lgn. So if the height of a tree is [lgn], and the number of nodes is n, then the tree is maximaly filled with the most minimal possible height. ``````The completed tree has the property that all leaves are in the same level. ``````In every complete tree the number of nodes in the left subtree of the root (or every node) is equal to the number of nodes in the right subtree. The algorithms for binary searching, insertion, finding predecessor, successor can have the nice O(lgn) worst case of running time when the tree is complete. Every subtree in a complete tree is also a complete tree.```

```Also this tree has the property of being balanced. A balanced tree is every tree in which the difference of the heights between the two subtrees of any node in the tree, is zero or at most one. The tree on Picture 6, a), is a balanced tree. But the tree on b) is not, the subtrees of the node 2 are very unbalanced, on one side we have an empty, totaly unexploited left subtree, and on the other side, we have a right subtree, that spreads to height five. The subtrees of the nodes 3 and 10, on Picture 5 are also unbalanced, which is obvious. ```

```The tree on b) has 6 nodes, but the height is 5, and lg6 is around 2.59, so the search alg. will not be optimal. To make it optimal we have to transform the tree to a balanced bin. tree with the same items: see a). In that case the search alg. will have compx. O(lgn). The tree is nearly completed that’s why that 2.59, but it is balanced.```

 a)                                                                                        b) Picture 7.

Also with a balanced tree, in every next iteration of the binary search algorithm, around 1/2 of the all the nodes are skiped (taken out of consideration), and there are only 1/2 nodes left over to be searched, because one of the two subtrees of a root node is taken out of consideration in every next iteration. The iteration stopes when there is only 1 node left in consideration. From this we can also conclude why the binary searching of a balanced tree is always around O(lgn), and the text below in more details explains this:

The property of a balanced binary tree compulses the need of every node (begining from the root) that has a left (or right) subtree that has height greater than 1, to have an opposite unempty subtree that will have height as (or one level less) of the height of the opposite subtree. And this recursively holds true for every other lower level node. This also implies the two subtrees of every node to share nearly the same number of nodes. Everything above said also implies that the balanced binary tree is always nearly completed. That’s why the binary searching with balanced tree lasts in the worst case around O(lgn).

To proof that every BST can be made balanced we’ll say the following:

The n items in every BST can be permuted in increasing sorted order. If the median of the array of the sorted items is made the root of the tree, then for the left and right subtrees of the root there are left over almost equal number of nodes. For the left subtree we have the elements from 0, ...,n/2 – 1, and for the right subtree the elements from n/2 + 1, ...,n. If we take the median items of these two arrays as the roots of the left and right subtrees of the root, that is, left and right children of the root (we will name these nodes as Rl and Rr), then for the left and right subtrees of Rl and Rr we have of course also equal number of nodes in sorted order. We’ll continue this pattern until we don’t insert all the nodes. This is how we can get a nearly completed tree – a balanced tree from every binary search tree, because we always trie to fill the two children of all nodes of the current level, and then go to the next level.

To implement this procedure in a programming language, we need to use some additional dinamically allocated structure (binary tree or a linked list) for saving the sorted array of nodes from the binary tree. We can express the complexity of this procedure by the recursive relation T(n)=n/2+2T(n/2).

Binary search trees that use algorithms for balancing are called self-balancing binary search trees.

AVL tree

AVL tree is a self-balancing binary search tree, and the first such data structure to be invented. The AVL algorithm is a dynamic method of balancing binary search trees. The AVL tree is named after its two inventors, Georgii Adel’son-Velsky and Yevgeniy Landis, who published it in their 1962 paper "An algorithm for the organization of information." In an AVL tree the heights of the two child subtrees of any node differ by at most one, therefore it is also called height-balanced. Additions and deletions of tree items may require the tree to be rebalanced by one or more tree rotations.

The balance factor of a node is the height of its right subtree minus the height of its left subtree. A node with balance factor 1, 0, or -1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees.

This is a function in C for getting the height of some node:

/*level starts from –1*/

int h(node * root, int level){

if(root == NULL)

return level+1;

else

return max(h(root->left,level+1),h(root->right,level+1));

}

AVL trees are often compared with red-black trees because red-black trees also take O(lg n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications. The AVL tree balancing algorithm appears in many computer science curricula.

The AVL tree ensures at most (3 / 2) * lg(n) worst case scenario of binary search trees.

The idea behind maintaining the balanceness of an AVL tree is that whenever we insert or delete an item, we have disturbed the balanceness of the tree, and then we must restore it by performing a set of manipulations (called rotations) on the tree. These rotations come in two flavors: single rotations and double rotations. And each flavor has its corresponding left and right versions, these versions are symmetric to each other.

APPENDIX

Parsing binary trees

Binary trees are used in computer programs for calculating arithmetical operations. They are used to parse, in right order, an arithmetical expressions, so the correct result in correct order can be obtained.

Before we explain the use of binary tree, we’ll say the following:

Every arithmetical expression is consisted from binary operations, every binary operation is consisted from two operands. The operands belong to some of the set of numbers: N, R, C...

We can define a general model for a binary operation with the following function:

let OpN belongs to one of the number sets: N, R, C, ...

OpN(Operand1, Operand2, Operation) = Operand1 Operation Operand2;

Where Operation belongs to the set S={+,-,*,/,...}, and Operand1 and Operand2 belongs to one of the number sets (N, R, C,...).

If Operation= ’+’ then: OpN(Operand1, Operand2, +) = Operand1 + Operand2;

If Operation= ’–’ then: OpN(Operand1, Operand2, –) = Operand1 – Operand2;

And etc...

Because OpN belongs to the same set of numbers as the two operands we can use this model to define a general recursive model for any arithmetical expresion. Every arithmetical expression can be performed recursively.

Here is the model:

OpN(OpNleft, OpNright, operation) = OpNleft operation OpNright

OpNleft and OpNright are also binary operations that belong to the same number set as their operands.

So it’s actually the same function represented recursively. This function must have terminating conditions, the function will be terminating if some of the OpN’s is C, where C is a constant number from the same set as the operands. If we properly define all the suboperations of an operation then we can preserve the priorities of executing the binary operations.

Let’s represent the following arithmetical expression:

(4+8)*3/12 + 5 – (3+2)

Here is the hierarhical order of the calculation of the expression:

First can be calculated 4+8, or 3/12, or 3+2

If we choose to calculate first 4+8 then we get this 12*3/12 + 5 – (3+2), now we can calculate 12*3, or 3/12 or 3+2, if we choose 3+2 then we get 12*3/12 + 5 – 5 => 36/12 + 0 => 3 + 0.

So OpNleft = 3 = 36/12, and OpNright = 0 = 5 – 5

OpNleftleft = 36 = 12*3, and 12 = OpNleftright = 12 (this is a terminal).

We can continue like this until we don’t reach all the terminals...

Because different members of an arithmetical expression have equal privileges of performing, in the same moment, we can choose different paths for calculating the final result of an expression, so the recursive functional model can be different in different cases.

We can graphicaly represent this arithmetical expression by using the recursive model as this following:

Picture 8.

So 4, 8, 3, 12, 3, 2 will be terminating characters.

So this is actually a binary tree. A binary tree is also a recursive structure. Actually, intuitively we can conclude that a binary tree can be used to represent arithmetical expressions.

As in the functional model so also with the binary tree we can notice that every operation (except for the highest) is an operand for some higher operation (his parent node).

A binary tree can be in fact defined recursively as the functional model above, by using structural induction as the following:

If a Binary Tree is not empty, terminating (and it could be) then: A Binary Tree is consisted from a root node, and two Binary (sub)Trees: left and right Binary (sub)Trees. This is a rule, tautology.

The binary tree is actually the functional model, they are both the same thing, just visualized differently. For example the operation 5 + 7 can be represented with a binary tree as this one:

Picture 9.

By the help of the preorder, postorder or inorder traversal we can calculate the arithmetical operation in the proper hierarhical way and obtain the correct result.

For example when we use postorder: after a visit of two terminating nodes, these nodes’s values are pushed in a stack, after that their parent (operation) node is visited. When it is visited, his operation is performed on the two values that are in the top of the stack right now, after this, the two values are poped from the stack and the result is pushed on the top of the stack, so one of the operands of the parent operation of the operation is right now obtained, the operation is transformed in a operand for the higher operation, if we continue with this same procedure for every next visited node, on the end we will obtain the correct result of the arithmetical expression on the top of stack, and the stack will have no other elements.

By using a relatively simple iterative, or recursive algorithm, that uses stack structure for preserving the hierarchy in the parsing of the expression, we can create the proper binary tree for any given arithmetical expression.

To say recursive or iterative algorithm is basically the same, they are both inductive functional patterns, as the binary tree.

Permutation trees

Trees can also be used to represent permutations, without repetition, and with repetitions. Not only that they are used to represent permutations, their tree traversals can be used for constructing algorithms that generate all permutations (with or without repetitions) for a given set of elements, only once for each different permutation. In a traversal (preorder, postorder, inorder) of a permutation tree, every time when leaf is reached, the leaf together with the path of nodes from that leaf to the root is printed, and that’s how all the permutations are generated.

Its more easy to generate in fly the permutations (and put them in some linear structures) while the process of building the permutation tree is going on.

Here is an example of a permutation without repetition tree for the set: 1, 2, 3, 4:

Picture 10.

A similar (it could be an infinite) tree can be made for permutations with repetitions.