Compsci 201 Trees towards Autocomplete Jeff Forbes March
Compsci 201 Trees towards Autocomplete Jeff Forbes March 21, 2018 2/23/2021 Comp. Sci 201, Trees 1
P is for … • Parallel • Better and more complicated than serial? • Planning • Come up with a set of actions that will achieve a goal • Probabilistic Inference • Markovian text recognition? • Priority. Queue • Efficiently finding minimum 2/23/2021 Comp. Sci 201, Trees 2
Plan for the Week • Considering different kinds of tree structures • Towards Autocomplete • Binary Search • Tries • Upcoming • Priority. Queue implementation with Heaps 2/23/2021 Comp. Sci 201, Trees 3
Towards Trees • Trees are powerful recurring structure • Quick-union and Spreading. News have tree structure • Binary search trees are used in Tree. Set and Tree. Map • Priority Queues are implemented with binary tree structure to allow for O(log n) add and remove. Min • Tries are useful in retrieving Strings in O(w) time where w is the # characters 2/23/2021 Comp. Sci 201, Trees 4
Binary Tree 2/23/2021 Comp. Sci 201, Trees 5
Binary Tree 0 1 2 2/23/2021 Comp. Sci 201, Trees 6
Binary Search Tree • Each node has a value • Nodes with values less than their parent are in the left subtree • Nodes with values greater than their parent are in the right subtree • What about equal? 2/23/2021 Comp. Sci 201, Trees 7
Binary Search Tree? A B C http: //bit. ly/201 -f 17 -1027 -1 2/23/2021 Comp. Sci 201, Trees 8
Binary Search Tree • Each node has a value • Nodes with values less than their parent are in the left subtree • Nodes with values greater than their parent are in the right subtree • What about equal? 2/23/2021 Comp. Sci 201, Trees 9
Binary Search Tree A B C http: //bit. ly/201 -s 18 -0321 -1 2/23/2021 Comp. Sci 201, Trees 10
• A Tree. Node by any other name… What does this look like? Doubly linked list? public class Tree. Node “llama” { Tree. Node left; “giraffe” “tiger” Tree. Node right; String info; Tree. Node(String s, Tree. Node llink, Tree. Node rlink){ info = s; left = llink; right = rlink; } } 2/23/2021 Comp. Sci 201, Trees 11
Tree function: Tree height • Compute tree height (longest root-to-leaf path) int height(Tree root) { if (root == null) return 0; else { return 1 + Math. max(height(root. left), height(root. right)); } } • Find height of left subtree, height of right subtree • Use results to determine height of tree 2/23/2021 Comp. Sci 201, Trees 12
Tree function: Leaf Count • Calculate Number of Leaf Nodes int leaf. Count(Tree root) { if (root == null) return 0; if (root. left == null && root. right == null) return 1; return leaf. Count(root. left) + leaf. Count(root. right); } • Similar to height: but has two base case(s) • Use results of recursive calls to determine # leaves 2/23/2021 Comp. Sci 201, Trees 13
Tree functions Analyzed int height(Tree root) { if (root == null) return 0; else { return 1 + Math. max(height(root. left), height(root. right)); } } • Let T(n) be time for height to run on n-node tree T(n) = 2 T(n/2) + O(1) - roughly balanced T(n) = T(n-1) + T(1) + O(1) = T(n-1) + O(1) - unbalanced 2/23/2021 Comp. Sci 201, Trees 14
Good Search Trees and Bad Trees http: //www. 9 wy. net/onlinebook/CPrimer. Plus 5/ch 17 lev 1 sec 7. html 2/23/2021 Comp. Sci 201, Trees 15
What is complexity? • Assume trees “balanced” in analyzing complexity • Roughly half the nodes in each subtree • Leads to easier analysis • How to develop recurrence relation? • What is T(n)? Time func executes on n-node tree • What other work? Express recurrence, solve it • How to solve recurrence relation • Plug, expand, plug, expand, find pattern • Proof requires induction to verify correctness 2/23/2021 Comp. Sci 201, Trees 16
Tree Questions 2/23/2021 http: //bit. ly/201 -s 18 -0321 -2 Comp. Sci 201, Trees 17
What’s in Common? 2/23/2021 Comp. Sci 201, Trees 18
Priority! • Applications of Priority Queues • Shortest Path: Google Maps to Internet Routing • Event based simulation: Predicting collisions • Best-first search, game-playing, AI • Java code below sorts list. How? Why? Big-O? 2/23/2021 Comp. Sci 201, Trees 19
PQDemo Problems • https: //coursework. cs. duke. edu/201 spring 18/classwork/blob/master/src/PQDemo. java • Keep track of top M of N elements (N >> M) in file. Priority. Queue<Integer> pq = new Priority. Queue<Integer>(); while (s. has. Next. Int()) { pq. add(s. next. Int()); if (pq. size() > M) pq. remove(); } • Big Oh if Priority. Queue implemented with Heap? • What if we wanted min M elements? 2/23/2021 Comp. Sci 201, Trees 20
What is autocomplete? • As user types in search box • Give potential completions. How? • Efficiency is key • 50 ms or go home • Data (Terms) 1. Possible words/phrases 2. Weights 2/23/2021 Comp. Sci 201, Trees 21
What will you do? • Write classes 1. Term: encapsulates a word/term and its corresponding weight 2. Binary. Search. Autocomplete: finds Terms with a given prefix by performing a binary search on a sorted array of Terms. 3. Trie. Autocomplete: finds Terms with a given prefix by building a trie to store the Terms. • Analyze and benchmark implementations 2/23/2021 Comp. Sci 201, Trees 22
Searching public interface Autocompletor { // Returns the top k matching terms in descending order of weight. public Iterable<String> top. Matches(String prefix, int k); // Returns the single top matching term public String top. Match(String prefix); // Return the weight of a given term public double weight. Of(String term); } https: //coursework. cs. duke. edu/201 spring 18/autocomplete-start 2/23/2021 Comp. Sci 201, Trees 23
The Term class • The Term class encapsulates a Comparable wordweight pair. • Includes completed compare. To method, which sorts lexicographically. • You are responsible for implementing three Term Comparators: • Weight. Order, which sorts in ascending weight order • Reverse. Weight. Order, which sorts in descending weight order • Prefix. Order, which sorts by the first r characters 2/23/2021 Comp. Sci 201, Trees 24
Prefix. Order • The goal of Prefix. Order is to sort lexicographically, but only considering the first r characters. • e. g. normally we would put “energy” before “entropy” lexicographically. However, Prefix. Order with r = 2 considers them equal (Prefix. Order with r = 3 would still put “energy” before “entropy”, however). • If one or both of the words is shorter than r characters, we just use normal lexicographic sorting. • For full credit, Prefix. Order’s compare method should take O(r). 2/23/2021 Comp. Sci 201, Trees 25
Brute. Autocomplete • Naïve approach to autocomplete • Store data as a Term array. • Find the top k matches: • iterates through the array, • pushes all terms starting with the prefix onto a max -priority queue sorted by weight. • Return top k terms off that priority queue • Find top match is similar 2/23/2021 Comp. Sci 201, Trees 26
Why is Brute. Autocomplete bad? • If we have n terms, m of which start with the prefix, then top. Kmatches is O(n + m log m) and top. Match is O(n). • Why is this bad? • Imagine Google! • So, we wish to improve upon Brute. Autocomplete, by only considering those terms that start with prefix 2/23/2021 Comp. Sci 201, Trees 27
Improving Brute. Autocomplete • Brute. Autocomplete had to iterate through every single term in the array because it did not have any prior knowledge as to where terms starting with the prefix could be located – i. e. the array was unsorted. • If we sort the array lexicographically, then all terms which start with the prefix will be adjacent. • Sorting takes O(n log n), but we only have to do it once • every call to top. Match or top. KMatches, regardless of inputs, can use the same sorted Term array. • Need to locate terms starting with the prefix quickly. • Use binary search. 2/23/2021 Comp. Sci 201, Trees 28
Binary Search • Search for 5? • Where to go? • How to reduce problem size? 2/23/2021 Comp. Sci 201, Trees 29
Binary Search • How did problem change? 2/23/2021 Comp. Sci 201, Trees 30
Binary Search 2/23/2021 Comp. Sci 201, Trees 31
Binary. Search. Autocomplete • For Autocomplete: find the range of all terms comparator considers equal to key • e. g. , all terms with a that match prefix auto • Binary. Search. Autocomplete is the 2 nd class you should implement. • Binary. Search. Autocomplete implements Autocompletor plus: • public static int first. Index. Of(Term[] a, Term key, Comparator<Term> comp) • public static int last. Index. Of(Term[] a, Term key, Comparator<Term> comp) • Use binary search to quickly return the first and last index respectively of an element in the input array which the comparator considers equal to key. • We specify first and last index because there could be multiple Terms in a which the comparator considers equal to key. 2/23/2021 Comp. Sci 201, Trees 32
Using tries for autocomplete • To completely eliminate our dependence on the terms which don’t start with our prefix, let’s use a trie instead of an array to store terms • Why is this an improvement? Consider trying to find the top k matches starting with some string str. • The very first thing we should do is navigate to the node representing str. The trie rooted at this node only contains nodes starting with str. • No matter how many words are in 2/23/2021 our trie, navigating to this node Comp. Sci 201, Trees 33
Tries • How to search for Terms? • Consider a trie (also called prefix tree), utilized in Trie. java: • This trie represents: • Dog • Doting • Drastic • Top • Torn • Trap 2/23/2021 Comp. Sci 201, Trees 34
Tries • This trie supports queries (add, contains, delete) in O(w) time for words of length w. • Each node in a trie has one subtrie for every next valid letter that can follow • • How would you add “duke” to the trie? How many new nodes created? • Red dots indicate nodes holding the final letter of a word 2/23/2021 Comp. Sci 201, Trees 35
- Slides: 35