CPSC 221 Data Structures Lecture 5 Branching Out

CPSC 221: Data Structures Lecture #5 Branching Out Steve Wolfman 2014 W 1 1

Today’s Outline • • • Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions 2

Binary Trees • Binary tree is either – empty (NULL for us), or – a datum, a left subtree, and a right subtree A B • Properties – max # of leaves: – max # of nodes: C D E F G H • Representation: Data left right pointer I J 3

Representation struct Node { KTYPE key; DTYPE data; Node * left; Node * right; }; A left right pointer B C left right pointerpointer D E F left right pointerpointer left right pointer A B D C E F 4

Today’s Outline • • • Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions 5

What We Can Do So Far • Stack • List – Push – Pop – Insert – Remove – Find • Queue – Enqueue – Dequeue • Priority Queue (skipped!) – Insert – Delete. Min What’s wrong with Lists? 6

Dictionary ADT • • Dictionary operations – – – create destroy insert find delete – would be tastier with brownies insert • brownies - tasty midterm • prog-project – so painful… who designed this language? • find(wolf) • wolf - the perfect mix of oomph wolf – the perfect mix of oomph and Scrabble value • Stores values associated with user-specified keys – values may be any (homogenous) type – keys may be any (homogenous) comparable type 7

Search/Set ADT • Dictionary operations – – – create destroy insert find delete insert • Min Pin find(Wolf) NOT FOUND • • Berner Whippet Alsatian Sarplaninac Beardie Sarloos Malamute Poodle • Stores keys – keys may be any (homogenous) comparable – quickly tests for membership 8

A Modest Few Uses • • Arrays and “Associative” Arrays Sets Dictionaries Router tables Page tables Symbol tables C++ Structures 9

Desiderata • Fast insertion – runtime: • Fast searching – runtime: • Fast deletion – runtime: 10

Naïve Implementations insert find delete • Linked list • Unsorted array • Sorted array worst one… yet so close! 11

Today’s Outline • • • Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions 12

Binary Search Tree Dictionary Data Structure • Binary tree property 8 – each node has 2 children – result: • storage is small • operations are simple • average depth is small* • Search tree property – all keys in left subtree smaller than root’s key – all keys in right subtree larger than root’s key – result: • easy to find any given key 5 2 11 6 4 10 7 9 12 14 13 13 *Technically: a result of both properties.

Example and Counter-Example 5 8 4 1 8 7 5 11 3 BINARY SEARCH TREE 2 7 4 11 6 10 15 NOT A BINARY SEARCH TREE 18 20 21 14

In Order Listing struct Node { KTYPE key; DTYPE data; Node * left; Node * right; }; 10 5 15 2 9 7 20 17 30 In order listing: 2 5 7 9 10 15 17 20 30 15

Finding a Node 10 5 15 2 9 7 a. b. c. runtime: d. e. 20 17 30 O(1) O(lg n) O(n lg n) None of these Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) return find(key, root->right); else return root; } 16

Finding a Node 10 5 15 2 9 7 20 17 30 WARNING: Much fancy footwork with refs (&) coming. You can do all of this without refs. . . just watch out for special cases. Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) return find(key, root->right); else return root; } 17

Iterative Find Node * find(Comparable key, Node * root) { while (root != NULL && root->key != key) { if (key < root->key) root = root->left; else root = root->right; } 10 5 15 2 9 7 20 17 30 return root; } Look familiar? (It’s trickier to get the ref return to work here. ) 18

Insert 10 5 15 2 9 7 runtime: 20 17 30 // Precondition: key is not // already in the tree! void insert(Comparable key, Node * root) { Node *& target(find(key, root)); assert(target == NULL); target = new Node(key); } 19 Funky game we can play with the *& version.

Digression: Value vs. Reference Parameters • Value parameters (Object foo) – copies parameter – no side effects • Reference parameters (Object & foo) – shares parameter – can affect actual value – use when the value needs to be changed • Const reference parameters (Object const & foo) – shares parameter – cannot affect actual value 20 – use when the value is too big for copying in pass-by-value

Build. Tree for BSTs • Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted into an initially empty BST: – in order – in reverse order – median first, then left median, right median, etc. so: 5, 3, 8, 2, 4, 7, 9, 1, 6 21

Analysis of Build. Tree • Worst case: O(n 2) as we’ve seen • Average case assuming all orderings equally likely turns out to be O(n lg n). 22

Bonus: Find. Min/Find. Max • Find minimum 10 5 • Find maximum 15 2 9 7 20 17 30 23

Double Bonus: Successor Find the next larger node in this node’s subtree. Node *& succ(Node *& root) { if (root->right == NULL) return root->right; else return min(root->right); } 10 5 15 2 Node *& min(Node *& root) { if (root->left == NULL) return root; else return min(root->left); } 9 7 20 17 30 24

More Double Bonus: Predecessor Find the next smaller node in this node’s subtree. Node *& pred(Node *& root) { if (root->left == NULL) return root->left; else return max(root->left); } Node *& max(Node *& root) { if (root->right == NULL) return root; else return max(root->right); } 10 5 15 2 9 7 20 17 30 25

Today’s Outline • Some Tree Review (here for reference, not discussed) • Binary Trees • Dictionary ADT • Binary Search Trees • Deletion • Some troubling questions 26

Deletion 10 5 15 2 9 7 20 17 30 Why might deletion be harder than insertion? 27

Lazy Deletion • Instead of physically deleting nodes, just mark them as deleted (with a “tombstone”) + + + – simpler physical deletions done in batches some adds just flip deleted flag 2 small amount of extra memory for deleted flag – many tombstones slow finds – some operations may have to be modified (e. g. , min and max) 10 5 15 9 7 20 17 30 28

Lazy Deletion Delete(17) 10 Delete(15) 5 Delete(5) Find(9) Find(16) 15 2 9 7 20 17 30 Insert(5) Find(17) 29

Deletion - Leaf Case 10 Delete(17) 5 15 2 9 7 20 17 30 30

Deletion - One Child Case 10 Delete(15) 5 15 2 9 7 20 30 31

Deletion - Two Child Case 10 Delete(5) 5 20 2 9 30 7 32

Finally… 10 7 2 20 9 30 33

Delete Code void delete(Comparable key, Node *& root) { Node *& handle(find(key, root)); Node * to. Delete = handle; if (handle != NULL) { if (handle->left == NULL) { // Leaf or one child handle = handle->right; } else if (handle->right == NULL) { // One child handle = handle->left; } else { // Two child case Node *& successor(succ(handle)); handle->data = successor->data; to. Delete = successor; successor = successor->right; // Succ has <= 1 child } delete to. Delete; } Refs make this short and “elegant”… } 34 but could be done without them with a bit more work.

Today’s Outline • Some Tree Review (here for reference, not discussed) • Binary Trees • Dictionary ADT • Binary Search Trees • Deletion • Some troubling questions 35

Thinking about Binary Search Trees • Observations – Each operation views two new elements at a time – Elements (even siblings) may be scattered in memory – Binary search trees are fast if they’re shallow • Realities – For large data sets, disk accesses dominate runtime – Some deep and some shallow BSTs exist for any data 36

Solutions? • Reduce disk accesses? • Keep BSTs shallow? 37

To Do • Continue readings on website! 38

Coming Up • • • Spawns parallel task. Since we have only one classroom, one of these goes first! cilk_spawn Parallelism and Concurrency cilk_spawn Self-balancing Binary Search Trees cilk_spawn Priority Queues cilk_spawn Sorting (most likely!) Huge Search Tree Data Structure cilk_join 39