Programming Data Structures and Algorithms Trees Anton Biasizzo
Programming, Data Structures and Algorithms (Trees) Anton Biasizzo Programming, Data Structures and Algorithms (Trees) Slide 1/57
Preliminaries q Linear access time of linked list is often prohibitive. q Trees: running time of most operations O(log n) on average. § File system implementation § Arithmetic expression evaluation q Definition of a tree (recursive): § Tree is a collection of nodes § The collection can be empty § Non-empty tree consists of a distinguished node r (root) and zero or more (sub)trees T 1, T 2, …, Tk, each of whose roots are connected by an edge to r. Programming, Data Structures and Algorithms (Trees) Slide 2/57
Tree q Tree is collection of n nodes, one of which is root, and n-1 edges. q An example of a tree. q Noda A is root. q Node F has a parent (A) and three children (K, L, and M) q Nodes with no children are leaves Programming, Data Structures and Algorithms (Trees) Slide 3/57
Properties of a tree q A path from node n 1 to nk is defined as sequence of nodes n 1, n 2, …, nk such that ni-1 is parent of ni. q The length of the path is the number of edges on the path. q There is exactly on path from root to each node. q Depth of the node ni is the length of the path from root to ni. q Depth of the tree is equal to the depth of the deepest leaf. q The height of node ni is the length of the longest path from ni to a leaf. q The height of a tree is the height of the root. q If there is a path from ni to nk, then ni is ancestor of nk and nk is a descendant of ni. Programming, Data Structures and Algorithms (Trees) Slide 4/57
Example of Tree structure q Directory structure is common use of trees. Programming, Data Structures and Algorithms (Trees) Slide 5/57
Implementation of Trees q Straight forward implementation: in each node store: § Data § Pointers to each child q The number of children may greatly vary § Infeasible to make direct links – too much wasted space q Solution: keep the children of each node in linked list. Programming, Data Structures and Algorithms (Trees) Slide 6/57
Binary Trees q A tree in which every node has no more than two children. q A binary tree consists of a root and two subtrees, TL and TR. q The average depth of a binary tree is considerably smaller than n – it is O( ). q For binary search trees the average depth is O(log n). q Depth can be n-1. Programming, Data Structures and Algorithms (Trees) Slide 7/57
Implementation q Use direct pointers to children q Similar structure to doubly linked lists: q Binary search tree type declaration: typedef struct tree_node *tree_ptr; struct tree_node { element_type element; tree_ptr left; tree_ptr right; }; typedef tree_ptr TREE; Programming, Data Structures and Algorithms (Trees) Slide 8/57
The Search Tree q q Searching is an important application of binary trees. Each node has assigned a key value (e. g. integer) Keys are distinct For any node X: § the values of all keys in its left subtree are smaller than the key value of X, § the values of all keys in its right subtree are larger than the key value of X. q Recursive definition of tree – recursive functions q Recursion efficient because of small average depth of the tree Programming, Data Structures and Algorithms (Trees) Slide 9/57
The Search Tree ADT q The Search tree ADT: § Find, Find. Min, Find. Max, § Insert, § Delete. q Find function tree_ptr find(element_type x, SEARCH_TREE T) { if ( T == NULL) return NULL; if ( x < T->element ) return find( x, T->left ); if ( x > T->element ) return find( x, T->right); return T; } Programming, Data Structures and Algorithms (Trees) Slide 10/57
Find routines q Find. Min function tree_ptr find_min(SEARCH_TREE T) { if ( T == NULL) return NULL; if ( T->left == NULL ) return T else return find_min(T->left); } q Find. Max function tree_ptr find_max(SEARCH_TREE T) { if ( T == NULL) return NULL; if ( T->right == NULL ) return T else return find_max(T->right); } Programming, Data Structures and Algorithms (Trees) Slide 11/57
Insert q Insert function § Insert x into tree T: Proceed down the tree like in find, If x is found do nothing or “update” (duplicate handling) Otherwise insert x at the last node on the traversed path. q Binary search tree implementation § § § Duplicates can be handled by extra field in node (extra space) Putting duplicates in tree tend to make tree very deep If keys are part of larger record, keep records in auxiliary data structure (table, list, tree, …) Programming, Data Structures and Algorithms (Trees) Slide 12/57
Insert routine q Insert function tree_ptr insert(element_type x, SEARCH_TREE T) { if (T == NULL) { // we’ve got past the tree leaf T = (SEARCH_TREE)malloc( sizeof(struct tree_node) ); if (T == NULL) fatal_error(“Out of space!!!”); else { T->element = x; T->left = T->right = NULL; } } else if (x < T->element) T->left = insert(x, T->left); else if (x > T->element) T->right = insert(x, T->right); return T; } Programming, Data Structures and Algorithms (Trees) Slide 13/57
Delete q In many data structures the hardest operation is deletion. q Find node with key x, § § If node is leaf delete it, delete also the reference at parent node. If node has one child (e. g. node 4), its parent is adjusted to bypass the node and than it is deleted. Programming, Data Structures and Algorithms (Trees) Slide 14/57
Delete § § If node has two children (e. g. node 2) several options are possible Proposed strategy: replace the key of this node with the smallest key of the right subtree and delete that node. Second delete is simpler since the smallest node on right subtree does not have left child. Programming, Data Structures and Algorithms (Trees) Slide 15/57
Delete routine q Delete function tree_ptr delete(element_type x, SEARCH_TREE T) { tree_ptr tmp_cell, child; if ( T == NULL) error ( “Element not found”); else if ( x < T->element ) T->left = delete( x, T->left ); else if ( x > T->element ) T->right = delete( x, T->right); else if ( T->left && T->right ) { /* Two children */ tmp_cell = find_min( T->right ); T->element = tmp_cell->element; T->right = delete( T->element, T->right ); } else { if ( T->left ) child = T->left; else child = T->right; free( T ); return child; } return T; } Programming, Data Structures and Algorithms (Trees) Slide 16/57
Balanced Tree q We expect that operations to take O(log n) time, because we descend in constant time and remaining tree is roughly half as large. q Average expected depth of any node is O(log n) over all possible binary search trees. q Insertions/Deletions may alter the binary search trees to have much larger depths. q Pre-sorted input results in linked list q Solution: 1. Insist on additional structural condition called balance (balanced trees e. g. AVL). 2. Allow arbitrary depths, but after every operation apply reconstruction (selfadjusting trees – e. g. splay tree) Programming, Data Structures and Algorithms (Trees) Slide 17/57
AVL Tree q AVL (Adelson-Velskii and Landis) tree is binary search tree with balance condition. q For every node the height of the left and right subtrees can differ by at most 1. q All tree operations can be performed in O(log n) time. Programming, Data Structures and Algorithms (Trees) Slide 18/57
AVL Tree q Insertion might destroy balance condition q In such cases the tree must be modified (rotation) q For fast operations store balance information (or subtree height) in nodes. q After insertion only nodes, that are on the path from the insertion point to the root might have their balance altered. Programming, Data Structures and Algorithms (Trees) Slide 19/57
AVL Tree q Let the node α must be rebalanced, there are four cases: 1. 2. 3. 4. Insertion into the left subtree of the left child of α Insertion into the right subtree of the left child of α Insertion into the left subtree of the right child of α Insertion into the right subtree of the right child of α q Single rotation fixes cases 1 and 4 q More complex double rotation fixes cases 2 and 3 Programming, Data Structures and Algorithms (Trees) Slide 20/57
Single Rotation q Node k 2 violates the AVL balance condition, if subtree A has grown to an extra level q q q Single rotation transformation requires a few pointer change Subtree A moves up on level. Subtree B stays at the same level. Subtree C moves down on level. Height of new entire subtree is the same as prior to the insertion Programming, Data Structures and Algorithms (Trees) Slide 21/57
Single Rotation Example q Insertion of 8 into left AVL tree destroys balanced condition at node k 2 (value 15) q After single rotation balanced tree is obtained Programming, Data Structures and Algorithms (Trees) Slide 22/57
Double Rotation q q Single rotation does not work on cases 2 and 3. A node causing the imbalance is inserted in the middle. Double rotation involves four subtrees Double rotation restores the height of the tree to what it was before the insertion. Programming, Data Structures and Algorithms (Trees) Slide 23/57
Double Rotation Example q Insertion of 10 cause violation of balance condition at node k 1 (value 6) q After double rotation balanced tree is obtained Programming, Data Structures and Algorithms (Trees) Slide 24/57
Splay Tree q Guarantee that any M consecutive tree operations starting from an empty tree take at most O(M log n) time. q Single operation might take O(n) time q Not so strong as O(log n) worst-case bound per operation q There are no bad input sequences q O(n) worst-case time per operation is not bad as long as it occurs relatively infrequently. q Basic idea: after a node is accessed it is pushed to the root by a series of AVL rotations q In practice: when a node is accessed it is likely that it will be accessed again q Splay tree does not require maintenance of tree height or balance information Programming, Data Structures and Algorithms (Trees) Slide 25/57
Splaying – single rotations q Single (zig) rotations on outer path q Plain bottom up approach usingle (zig) rotations does not work well because it pushes nodes on the path too deep Programming, Data Structures and Algorithms (Trees) Slide 26/57
Splaying – single rotations q Initial tree, performing zig operation on node 1 Programming, Data Structures and Algorithms (Trees) Slide 27/57
Splaying – single rotations Programming, Data Structures and Algorithms (Trees) Slide 28/57
Splaying – single rotations Programming, Data Structures and Algorithms (Trees) Slide 29/57
Splaying – single rotations Programming, Data Structures and Algorithms (Trees) Slide 30/57
Splaying – single rotations Programming, Data Structures and Algorithms (Trees) Slide 31/57
Splaying – single rotations q Zig operations push nodes on the path (e. g. 2) deep Programming, Data Structures and Algorithms (Trees) Slide 32/57
Splaying q The problem is that the structure of outer path remains intact q Two zig operation: q The splaying strategy is more selective about how rotations are performed (still bottom up) Programming, Data Structures and Algorithms (Trees) Slide 33/57
Splaying zig-zig q Zig-zig case, when x is on the one side of both parent and grandparent, a two single rotations are performed: Programming, Data Structures and Algorithms (Trees) Slide 34/57
Splaying – zig-zig q Initial tree, performing zig-zig operation on node 1 Programming, Data Structures and Algorithms (Trees) Slide 35/57
Splaying – zig-zig Programming, Data Structures and Algorithms (Trees) Slide 36/57
Splaying – zig-zig Programming, Data Structures and Algorithms (Trees) Slide 37/57
Splaying – zig-zig Programming, Data Structures and Algorithms (Trees) Slide 38/57
Splaying – zig-zig q Two nodes are packed together, e. g. (2, 3), (4, 5), … q Height is reduced by slightly less then 1/2 of depth of node 1 Programming, Data Structures and Algorithms (Trees) Slide 39/57
Splaying zig-zag q Zig-zag case, when x is in the middle of parent and grandparent a double rotation is performed: q Because rotations for splay trees are performed in pairs from bottom up, a recursive implementation does not work. q When access paths are long, rotations tend to be good for future operations Programming, Data Structures and Algorithms (Trees) Slide 40/57
B-Trees q B-tree is not binary tree q B-tree of order M has following structural properties 1. The root is either a leaf or has between 2 and M children 2. All non-leaf nodes (except root) have between [M/2] and M children 3. All leaves are at the same depth q All data is stored at the leaves q Each interior node contains pointer P 1, P 2, … PM to the children, and values k 1, k 2, …, k. M-1 representing the smallest key found in subtrees P 2, … PM respectively. q Some pointers might be NULL and corresponding ki undefined. q For every node, all the keys in subtree P 1 are smaller than keys in subtree P 2, and so on. Programming, Data Structures and Algorithms (Trees) Slide 41/57
Example of B-Tree of order 4 q B-tree of order 4 is known as 2 -3 -4 tree Programming, Data Structures and Algorithms (Trees) Slide 42/57
Operations of B-Trees q B-tree of order 3 is known as 2 -3 tree q Interior nodes in ellipses, keys in the leaves are ordered Programming, Data Structures and Algorithms (Trees) Slide 43/57
Find Operation of B-Trees q Start at root and branch in one of (at most) three directions, depending on the relation of the searched key to the two values stored at leaf node (i. e. X=23). Programming, Data Structures and Algorithms (Trees) Slide 44/57
Insert Operation of B-Trees q Follow the path as in find operation (X=18) q When we get to leaf node we found the correct place to put element Programming, Data Structures and Algorithms (Trees) Slide 45/57
Insert Operation of B-Trees q If there is no place for new element (X=1), new leaf is created and parent updated Programming, Data Structures and Algorithms (Trees) Slide 46/57
Insert Operation of B-Trees q If there is no place for new element (X=19), new leaf is created and parent updated Programming, Data Structures and Algorithms (Trees) Slide 47/57
Insert Operation of B-Trees q Since previous insertion violate the rule of 3 children the node is split into two nodes q If the root node is split a new root node is created and tree gains height. Programming, Data Structures and Algorithms (Trees) Slide 48/57
Delete operation of B-Trees q Find the key q Delete and remove it. q If necessary combine two parents into single one Programming, Data Structures and Algorithms (Trees) Slide 49/57
Properties of B-Trees q q The depth of B-Tree is [log[M/2] N] Insert and delete take O(M log. M N) time. Find take only O(log N) time. B-Trees are used when external storage is used. § § Slow memory access (latency) High data throughput q B-Trees are used in database systems, file systems Programming, Data Structures and Algorithms (Trees) Slide 50/57
Red-Black Tree q Red-Black tree is self-balancing binary search tree with q Introduced by Leonidas J. Guibas and Robert Sedgewick (1978), adopted by Arne Andersson (1993) and Chris Okasaki (1999) q Balance is preserved by painting each node in red or black. q When tree is modified, new tree is efficiently rearranged repainted. q The balancing is not perfect, but guaranties O(log n) search time. q Insertion and deletion with required tree rearrangement are performed in O(log n) time Programming, Data Structures and Algorithms (Trees) Slide 51/57
Red-Black Tree q Leaf nodes (NIL node) do not contain data – null child pointers q To save running time one sentinel node can be used for all leaf nodes q Constraints: 1. 2. 3. 4. 5. Each node is either red or black The root is black All leaves are black If node is red, then both its children are black Every path from a given node to any descendant leaf node contains the same number of black nodes. Programming, Data Structures and Algorithms (Trees) Slide 52/57
Red-Black Tree q Black depth of a node: number of black nodes on the path from the root to the node q Black height of a tree: number of black nodes on the paths from the root to any leaf node q Property: The path from the root to the farthest leaf node is no more than twice as long as the path from the root to the nearest leaf node q Worst case running time for operations like insert, delete, and find is proportional to the height of the tree § Follows from constraints 4 and 5: if the shortest path from root to leaf node consists of B black nodes. Longer paths can be constructed by inserting at most B red nodes since red nodes are always followed by black node Programming, Data Structures and Algorithms (Trees) Slide 53/57
Red-Black Tree Programming, Data Structures and Algorithms (Trees) Slide 54/57
Inserting into Red-Black Tree q Insert new node and colour it red. § New node cannot be black since this would increase number of black nodes in this path q New node replaces a leaf and add two leaves q Constraints might be violated § Constrains may be violated when root of current subtree becomes red 1. Added node is root. Root must not be black (1) Programming, Data Structures and Algorithms (Trees) Slide 55/57
Inserting into Red-Black Tree 2. Parent P of added node N is black 3. Parent P is red (P is not root), grandparent G is black, N’s uncle U is red Programming, Data Structures and Algorithms (Trees) Slide 56/57
Inserting into Red-Black Tree 4. Parent P is red (P is not root), grandparent G is black, N’s uncle U is black § Added node in the outer subtree Programming, Data Structures and Algorithms (Trees) Slide 57/57
Inserting into Red-Black Tree 4. Parent P is red (P is not root), grandparent G is black, N’s uncle U is black § Added node in the inner subtree § Use previous operation to resolve condition violation Programming, Data Structures and Algorithms (Trees) Slide 58/57
Deleting from Red-Black Tree q If node D (delete) has two non-leaf children § § § Select largest node in left subtree Select smallest node in right subtree, or Select largest node from left subtree Move values from selected node D to node C Remove selected node (does not have two non-leaf children) Programming, Data Structures and Algorithms (Trees) Slide 59/57
Deleting from Red-Black Tree q Node D is red § § Both children are leaves Replace it with its child C (black leaf node) q Node D is black and its child C is red § § Replace it with its child C Repaint it in black q Node D is black and its child C is black § Both children are leaves Programming, Data Structures and Algorithms (Trees) Slide 60/57
Deleting from Red-Black Tree q Node D is root node § Delete node D, C is new root q Node D has parent P and sibling S: several situations depending on sibling S are possible § Sibling S cannot be a leaf node, SL is S’s left child, SR is S’s right child q Sibling S is red § § Reverse the colour of P and S Move S to top by rotation q Sibling S is black § Several situations are possible… Programming, Data Structures and Algorithms (Trees) Slide 61/57
Other operations on Red-Black Tree q Set operations: § § § Union Intersection Set difference q Join operation q Split operation q Algorithms on Red-Black tree can be parallelized § Construction from linked list can be done in O(log n) time Programming, Data Structures and Algorithms (Trees) Slide 62/57
- Slides: 62