BST Data Structure n A BST node contains
BST Data Structure n. A BST node contains: – A key (used to search) – The data associated with that key – Pointers to children, parent • Leaf nodes have NULL pointers for children n. A BST contains – A pointer to the root of the tree. 1
BST Operations: Insert BST property must be maintained n Algorithm sketch: – To insert data with key k – Compare k to root. key – If k < root. key, go left – If k > root. key, go right – Repeat until you reach a leaf. That's where the new node should be inserted. n • Note: keep track of prospective parent along the way. 2
BST Operations: Insert n Running time: – The new node is inserted at a leaf position, so this depends on the height of the tree. n Worst case: – Inserting keys 1, 2, 3, . . . in this order will result in a tree that looks like a chain: • Tree has degenerated to list 1 • Height : linear • Note also that such a tree is worse than a linked list since it takes up more space (more pointers) 2 3 3
BST Operations: Insert n Running time: – The new node is inserted at a leaf position, so this depends on the height of the tree. n Best case – The top levels of the tree are filled up completely – The height is then logn where n is the number of nodes in the tree. 2 12 4 14 8 16 4
BST Operations: Insert n The height of a complete (i. e. all levels filled up) BST with n nodes is logarithmic. Why? – Level i has 2 i nodes, for i=0 (top level) through h (=height) – The total number of nodes, n, is then: n = 20+21+. . . +2 h = (2 h+1 -1)/(2 -1) = 2 h+1 -1 Solving for h gives us h logn 5
BST Operations: Insert n Analysis conclusion – An insert operation consists of two parts: • Search for the position – best case logarithmic – worst case linear • Physically insert the node – constant 6
BST Operations: Insert n What if we allow duplicate keys? – Idea #1 : Always insert in the right subtree • Results in very unbalanced tree – Idea #2 : Insert in alternate subtrees • Makes it difficult to search for all occurrences – Idea #3 : All elements with the same key are inserted in a single node • Good idea! – Easy to search, does not affect balance any more than non-duplicate insertion. 7
BST Operations: Insert n What if we allow variable number of children? (n-ary tree) – Idea : Use a vector/list of pointers to children. 8
BST Operations: Search n Take advantage of the BST property. n Algorithm sketch: – Compare target to root – If equal, return success – If target < root, search left – If target > root, search right n Running time: – Similar to insert 9
BST Operations: Delete n The Delete operation consists of two parts: – Search for the node to be deleted • best case constant (deleting the root) • worst case linear – Delete the node • best case? • worst case? 10
BST Operations: Delete n CASE #1 – The node to be deleted is a leaf node. – Easy! • Physically remove the node. • Constant time – We are just resetting its parent's child pointer and deallocating memory 11
BST Operations: Delete n CASE #2 – The node to be deleted has exactly one child – Easy! • Physically remove the node. • Constant time – We are just resetting its parent's child pointer, its child's parent pointer and deallocating memory 12
BST Operations: Delete n CASE #3 – The node to be deleted has two children – Not so easy • If we physically delete the node, we'll have to place its two children somewhere. This seems to require too much tree restructuring. • But we know it's easy to delete a node that has at most one child. What if we find such a node whose contents can be copied over without violating the BST property and then physically delete that node? 13
BST Operations: Delete n CASE #3, continued – The node to be deleted, x, has two children – Idea: • Find the x's immediate successor, y. It is guaranteed to have at most one child • Copy the y's contents over to x • Physically delete y. 14
BST Operations: Delete n Finding the immediate successor: – We know that the node has two children. Due to the BST property, the immediate successor will be in the right subtree. – In particular, the immediate successor will be the smallest element in the right subtree. – The smallest element in a BST is always the leftmost leaf. 15
BST Operations: Delete n Finding the immediate successor: – Since it requires traveling down the tree from the current node to a leaf, it may take up to linear time in the worst case. – In the best case it will take logarithmic time. – The time to perform the copy and delete the successor is constant. 16
Binary Search Trees n Traversing a tree = visiting its nodes n Three major ways to traverse a binary tree: • preorder • visit root • visit left subtree • visit right subtree • inorder • visit left subtree • visit root • visit right subtree • postorder • visit left subtree • visit right subtree • visit root When applied on a BST, it visits the nodes in order from smaller to larger 17
Binary Search Trees void print_inorder(Node *subroot ) { if (subroot != NULL) { print_inorder(subroot left); cout << subroot data; print_inorder(subroot right); } } How long does this take? There is exactly one call to print_inorder() for each node of the tree. There are n nodes, so the running time of this operation is (n) 18
Binary Search Trees n. A tree may also be traversed one "level" at a time (top to bottom, left to right). This is usually called a level-order traversal. – It requires the use of a temporary queue: enqueue root while (queue is not empty) { get the front element, f print f enqueue f's children dequeue } 19
Binary Search Trees 12 4 2 14 8 6 16 10 in-order : 2 - 4 - 6 - 8 - 10 - 12 - 14 pre-order: 12 - 4 - 2 - 8 - 6 - 10 - 14 - 16 post-order: 2 - 6 - 10 - 8 - 4 - 16 - 14 - 12 level-order: 12 - 4 - 14 - 2 - 8 - 16 - 10 20
Binary Search Trees n Idea for sorting algorithm: – Given a sequence of integers, insert each one in a BST – Perform an inorder traversal. The elements will be accessed in sorted order. n Running time: – In the worst case, the tree will degenerate to a list. Creation will take quadratic time and traversal will be linear. Total: O(n 2) – On average, the tree will be mostly balanced. Creation will take O(nlogn) and traversal will again be linear. Total: O(nlogn) 21
BSTs vs. Lists n Time – In the worst case, all dictionary operations are linear. – On average, BSTs are expected to do better. n Space – BSTs store an additional pointer per node. n The BST seemed like a good idea, but in the end it doesn't offer much improvement. – We must find a way to keep the tree balanced and guarantee logarithmic height. 22
Balanced Trees n There are several ways to define balance n Examples: – Force the subtrees of each node to have almost equal heights – Place upper and lower bounds on the heights of the subtrees of each node. – Force the subtrees of each node to have similar sizes (=number of nodes) 23
- Slides: 23