Binary Search Trees 15 211 Fundamental Structures of
Binary Search Trees 15 -211 Fundamental Structures of Computer Science Ananda Guna Jan. 23, 2003 Based on lectures given by Peter Lee, Avrim Blum, Danny Sleator, William Scherlis, Ananda Guna & Klaus Sutner
First a Review of Stacks and Queues
A Stack interface public interface Stack { public void push(Object x); public void pop(); public Object top(); public boolean is. Empty(); public void clear(); }
Stacks are LIFO Push operations: e d c b a
Stacks are LIFO Pop operation: e d c b a Last element that was pushed is the first to be popped.
A Queue interface public interface Queue { public void enqueue(Object x); public Object dequeue(); public boolean is. Empty(); public void clear(); }
Queues are FIFO back front k r q c m
Queues are FIFO Enqueue operation: back y front k r q c m
Queues are FIFO Enqueue operation: back front y k r q c m
Queues are FIFO Dequeue operation: back front y k r q c m
Implementing stacks, 1 Linked representation. All operations constant time, or O(1). c b a
Implementing stacks, 2 z An alternative is to use an array-based representation. a b c top z What are some advantages and disadvantages of an array-based representation?
A queue from two stacks Enqueue: Dequeue: j a i b h c g d f e What happens when the stack on the right becomes empty?
Now to Trees
CS is upside down root leaves
Trees are everywhere z. Trees are everywhere in life. z. As a result, in computer programs, trees turn out to be one of the most commonly used data structures.
Arithmetic Expressions + * 2 5 7
Game trees
Directory structure /afs cs usr andrew acs course 15 127 18 211 usr
Tree Definitions z. A tree is a set of nodes and a set of directed edges that connects pairs of nodes. z. A tree is a a Directed, Acyclic Graph (DAG) with the following properties z - one vertex is distinguished as the root; no edges enter this vertex z - every other vertex has exactly one entering edge
Trees, more abstractly z. A tree is a directed graph with the following characteristics: y. There is a distinguished node called the root node. y. Every non-root node has exactly one parent node (the root has none).
A closer look at Trees R siblings T 2 T 1 T 3
Unique parents a b e c f root d
Implementation of Trees z. How do we implement a general tree? Eg: A file system z. Each node will have two links y. One to its left most child y. One to its right sibling
Implementation of a binary tree with an array z. Assume that the left child of node i (i=1…. ) is stored at 2 i and right child of node I is stored at 2 i+1 z. Draw the tree represented by the following array (assume indices start from 1) z 12 10 15 8 11 14 18 z. Question: What is the minimum height of a binary tree with n nodes? What is the maximum height?
Binary Tree Traversals z Inorder – Left-Root-Right y. Use stack or recursion z Pre. Order – Root-Left-Right y. Use Stack or recursion z Post. Order-Left-Right-Root y. Use Stack or recursion z Level Order Traversal y. Use a queue z What is the output of each of the traversal? (see next slide for BFS in a tree)
Algorithm for Breadth-first traversal (of a tree using a queue) enqueue the root while (the queue is not empty) { dequeue the front element print it enqueue its left child (if present) enqueue its right child (if present) }
Facts and Questions About Trees z A path from node n 1 to nk is defined as a path n 1, n 2, …. nk such that ni is the parent of ni+1 z Depth of a node is the length of the path from root to the node. What is the depth of root? What is the maximum depth of a tree with N nodes? z What is the number of edges in a tree with N nodes? z Height of a node is length of a path from node to the deepest leaf. The height of the tree is the ________? z Let T(n) be the number of null pointers in a tree of n nodes. Show that T(n) = n + 1
Time to think about complexity of Algorithms z. Considering algorithms y. Is the approach correct? y. How fast does it run? y. How much memory does it use? y. Can I finish writing the code in the next 8 hours? z What is most important? z Consider fib(n) = fib(n-1)+fib(n-2) for n >= 2 zfib(0)=fib(1)=1 z. Lets look at a simple algorithm
Fibonacchi Tree Closed form public static long fib(int n) { if (n <= 1) return 1; return fib(n-1) + fib(n-2); } F(5) F(3) F(4) F(3) F(2) F(1) F(2) F(0) F(1) F(0) z It turns out the number of function calls is proportional to fib(n) itself! In fact, it's exactly 2*fib(n) - 1. z fib(90) takes about 7000 years on 1 Ghz machine.
Making Fibonacci more efficient z Can we write a better algorithm? z Can we reuse some of the parts of the recursion? z // call initially as fastfib(0, 1, n) public static long fastfib(long prev, long current, int togo) { if (togo <= 0) return current; return fastfib(current, current+prev, togo-1); } z What is the complexity of this algorithm?
A question about height and number of nodes in a binary tree z Suppose we have n nodes in a complete binary tree of height h. What is the relation between n and h? z The number of nodes in level i is 2 i (i=0, 1, …, h) z Therefore total nodes in all levels is z So what is the relation between n and h? z A binary tree is completely full if it is of height, h, and has 2 h+1 -1 nodes.
Bit about asymptotic analysis z. O notation: z. T(n) is O(f(n)) if there exist two positive constants c and n 0 such that T(n) <= c*f(n) for all n > n 0 z. Omega notation: z. T(n) is Omega(f(n)) if there exist two positive constants c and n 0 such that T(n) >= c*f(n) for all n > n 0 z. Theta notation: z. T(n) is Theta(f(n)) if it is both O(f(n)) AND Omega(f(n)).
“Big-Oh” notation T(N) = O(f(N)) “T(N) is order f(N)” c f(N) running time T(N) n 0 N
Some examples z. If f(n) = 10 n + 5 and g(n) = n show f(n) is O(g(n)) zf(n) = 3 n 2 + 4 n + 1. Show f(n) is O(n 2) zshow that 5 log(n) is O(n) zf(n) = 3 n 2 + 4 n + 1. Show f(n) is (n 2) z. Therefore f(n) = theta(n 2)
Logarithms and exponents z. Logarithms and exponents are everywhere in algorithm analysis logba = c if a = bc
Logarithms and exponents z. Usually will leave off the base b when b=2, so for example log 1024 = 10
Some useful equalities logbac = logba + logbc logba/c = logba - logbc logbac = clogba = (logca) / logcb (ba)c = bac babc = ba+c ba/bc = ba-c
Big-Oh again z. When T(N) = O(f(N)), we are saying that T(N) grows no faster than f(N). y. I. e. , f(N) describes an upper bound on T(N). z. Put another way: y. For “large enough” inputs, c f(N) always dominates T(N). z. Called the asymptotic behavior
Big-O characteristic z. If T(N) = c f(N) then y. T(N) = O(f(N)) y. Constant factors “don’t matter” z. Because of this, when T(N) = O(c g(N)), we usually drop the constant and just say O(g(N))
Big-O characteristic z. Suppose T(N)= k, for some constant k z. Then T(N) = O(1)
Big-O characteristic z. More interesting: y. Suppose T(N) = 20 n 3 + 10 nlog n + 5 y. Then T(N) = O(n 3) y. Lower-order terms “don’t matter” z. Question: y. What constants c and n 0 can be used to show that the above is true? z. Answer: c=35, n 0=1
Big-O characteristic z. If T 1(N) = O(f(N)) and T 2(N) = O(g(N)) then y. T 1(N) + T 2(N) = max(O(f(N)), O(g(N)). y. The bigger task always dominates eventually. z. Also: y. T 1(N) T 2(N) = O(f(N) g(N)).
Some common functions
BST-An Inductive Perspective Let's focus on binary trees (left/right child only). A binary tree is either • empty (we'll write nil for clarity), or • looks like (x, L, R) where x is the element stored at the root, and L, R are the left and right subtrees of the root.
In Pictures x Empty Tree R L
Flattening a BT a T b e flat(T) = e, b, f, a, d, g d f g
Def: Binary Search Tree A binary T is a binary search tree (BST) iff flat(T) is an ordered sequence. Equivalently, in (x, L, R) all the nodes in L are less than x, and all the nodes in R are larger than x.
Example 5 3 2 7 4 6 flat(T) = 2, 3, 4, 5, 6, 7, 9 9
Why do we care? versus
Binary Search How does one search in a BST? search(x, nil) = false search(x, L, R)) = true search(x, (a, L, R)) = search(x, L) x<a search(x, (a, L, R)) = search(x, R) x>a should return value
Correctness Clearly, search() can never return a false positive answer. But search() only walks down one branch, so how do we know we don't get false negative answers? Suppose T is a BST that contains x. Claim: search(x, T) properly returns "true".
Proof T cannot be nil, so suppose T = (a, L, R). Case 1: x = a: done. Case 2: x < a: Since T is a BST, x must be in L. But by induction (on trees), search(x, L) returns true. Done. Case 3: x > a: same as case 2.
Insertions in a BST are very similar to searching: find the right spot, and then put down the new element as a new leaf. We will not allow multiple insertions of the same element, so there is always exaxtly one place for the new guy.
How Many? How many decisions do we have to make before we have either found the element, or know it's not in the tree? We walk down a branch in the tree, so the worst case RT for search is O( depth of T ) = O( # nodes )
Good Tree But in a "good" BST we have depth of T = O( log # nodes ) Theorem: If the tree is constructed from n inputs given in random order, then we can expect the depth of the tree to be log 2 n. But if the input is already (nearly, reverse, …) sorted we are in trouble.
Forcing good behavior It is clear (? ) that for any n inputs, there always is a BST containing these elements of logarithmic depth. But if we just insert the standard way, we may build a very unbalanced, deep tree. Can we somehow force the tree to remain shallow? At low cost?
AVL-Trees G. M. Adelson-Velskii and E. M. Landis, 1962 1 or less
Next Week More about AVL trees on tuesday Homework 1 is due Monday 27 th. This is a good time to catch up with Java deficiencies, if any.
- Slides: 60