Compsci 201 Priority Queues Heaps Autocomplete Owen Astrachan

  • Slides: 33
Download presentation
Compsci 201 Priority Queues + Heaps Autocomplete Owen Astrachan ola@cs. duke. edu November 14,

Compsci 201 Priority Queues + Heaps Autocomplete Owen Astrachan ola@cs. duke. edu November 14, 2018 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 1

T is for … • Alan Turing, Turing Test, Turing award • From WWII

T is for … • Alan Turing, Turing Test, Turing award • From WWII to philosophy to math to computing • Trie • O(#characters) search 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 2

Alan Turing • • 2: 46 marathon 15: 20 three mile Enigma machine and

Alan Turing • • 2: 46 marathon 15: 20 three mile Enigma machine and WWII Entscheidungsproblem Sometimes it is the people no one can imagine anything of who do the things no one can imagine 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 3

Plan for Today • Review Autocomplete • Big picture, little picture, what's dated but

Plan for Today • Review Autocomplete • Big picture, little picture, what's dated but … • Motivate with application of Priority Queue • Priority Queues from client perspective • See use in Brute. Autocomplete and more • Heaps: Priority Queues implementor POV • Some trees can be stored in arrays 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 4

But First, Optional APTs • Professors have grand plans for students • You're at

But First, Optional APTs • Professors have grand plans for students • You're at Duke to learn, you'll just do problems • Excuse me? What's that you said? • Challenge problems, make-up problems • Some are very hard, some already done • What will you see again, maybe • Grid. Game, Flood. Relief, Medal. Table, BSTCount 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 5

Autocomplete • 70, 000 queries/second, thousands of computers, 0. 2 seconds to answer query,

Autocomplete • 70, 000 queries/second, thousands of computers, 0. 2 seconds to answer query, … 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 6

Geolocating Heaven 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 7

Geolocating Heaven 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 7

Tradeoffs in Autocomplete • Like search in Google, we want the best or top

Tradeoffs in Autocomplete • Like search in Google, we want the best or top or most-weighty matches • Each term is (word, weight) pair • Sort by weight, we want only top 10 or top k • Don't sort everything if we only need 10 • Priority. Queues can help, so can binary search • Don't sort 109 items if 103 match "duke b" 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 8

Brute. Autocomplete • There are N terms (word, weight) • M of these match

Brute. Autocomplete • There are N terms (word, weight) • M of these match a prefix, e. g. , "auto" • We want the top k of these M matches • N is millions, M is thousand/hundred, k is 10 • Naïve: find and store M terms in array, sort them, choose the heaviest k: N + M log M + k • Where does M log M term come from? • Where does N term come from? 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 9

Can we do better than Naïve Brute? • We can change M log M

Can we do better than Naïve Brute? • We can change M log M to M log k • Use a priority queue • Who cares? Does this make a difference? • We can use binary search, after sorting • Find first and last of M matching terms: log N • Changes time to log N + M log M + k 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 10

Binary Search in Autocomplete • Given "beenie" and prefix of 3, find M matches

Binary Search in Autocomplete • Given "beenie" and prefix of 3, find M matches • Find first "bee. . " and last "bee. . " • Sort these M elements by weight! Done • O(log N) to find first and last O(M log M) to sort 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 11

Binary. Search. Autocomplete • Construct with N terms, sort them: O(N log N) •

Binary. Search. Autocomplete • Construct with N terms, sort them: O(N log N) • Sort once, at construction, lexicographically • Leverage this sort for each prefix query • Find first and last of M matching prefixes • O(log N), prefix search leverages sorted order • Sort these matching terms • O(M log M) if we just sort, then take top k • O(M log k) if we use limited priority queue 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 12

Tradeoffs Summarized • Brute: O(N + M log k) since uses priority queue •

Tradeoffs Summarized • Brute: O(N + M log k) since uses priority queue • Binary: O(log N + M log M) or O(log N + M log k) • Requires sorting once: O(N log N) • Amortize this cost over many queries, ignore? • If we make Q queries: • Brute: Q x N • Binary: Q x log N 201, –Fallrecoup sorting cost? Compsci 2018, Autocomplete 11/14/2018 + PQs + Heaps 13

Something Old, Something New • Sort list of Terms by weight? Or reverse public

Something Old, Something New • Sort list of Terms by weight? Or reverse public static class Weight. Order implements Comparator<Term> order? public static class Reverse. Weight. Order implements … • Call v. get. Weight() w. get. Weight() • Before Java 8 and after Java 8 Collections. sort(list, new Term. Reverse. Weight. Order()); Collections. sort(list, Comparator. comparing(Term: : get. Weight). reversed()); 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 14

WOTO http: //bit. ly/201 fall 18 -nov 14 -1 11/14/2018 Compsci 201, Fall 2018,

WOTO http: //bit. ly/201 fall 18 -nov 14 -1 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 15

From Tree to Retrieval • Better than O(log N)? O(w) prefix-length == w •

From Tree to Retrieval • Better than O(log N)? O(w) prefix-length == w • Tradeoff? Lots of memory, fast run-time 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 16

Binary. Search. Autocomplete • Why are loop invariants provided for code? • Help you

Binary. Search. Autocomplete • Why are loop invariants provided for code? • Help you reason about code, think about code • What do you know when loop exits? • first. Index: list. get(high) might be value compare(list. get(high), target) == 0 • Loop development: art, science, trial-anderror 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 17

Start with code and change, … • Do not use this reference to achieve

Start with code and change, … • Do not use this reference to achieve O(log N) http: //stackoverflow. com/questions/6676360/firstoccurrence-in-a-binary-search • One idea: find standard code and mess with it until it works • Not Einstein maybe, but 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 18

How to develop loops • David Gries: The Science of Programming • Edsger Dijkstra:

How to develop loops • David Gries: The Science of Programming • Edsger Dijkstra: The Discipline of Programming 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 19

Reasoning about code A. Runs Forever B. Exhausts Memory and stops C. Prints~ (2

Reasoning about code A. Runs Forever B. Exhausts Memory and stops C. Prints~ (2 billion), D. Prints~ (-2 billion) public class Looper { public static void main(String[] args){ int x = 0; while (x < x + 1) { x = x + 1; } System. out. println("value of x = "+x); } } 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 20

Priority. Queues top to bottom • All operations are O(log N) where N size

Priority. Queues top to bottom • All operations are O(log N) where N size of PQ • This for add and remove; can peek in O(1) • Always remove the smallest element, min. PQ • Can change by providing a Comparator • Shortest-path, e. g. , Google Maps. Best-first search in games • Best element removed from queue, not first 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 21

Priority. Queues top to bottom • How can we sort elements using Priority Queue?

Priority. Queues top to bottom • How can we sort elements using Priority Queue? • Add all elements to pq, then remove them • Every operation is O(log N), so this sort? • O(N log N) – basis for heap sort https: //github. com/astrachano/sorting-fall 18 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 22

Finding top M of N • Sort all and get first (or last) M

Finding top M of N • Sort all and get first (or last) M • O(N log N) to sort, then O(M), typically N >> M • Code below doesn't alter list parameter • Why is comp. reversed() used? https: //github. com/astrachano/sorting-fall 18 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 23

Finding top M of N • Can do this in O(N log M) using

Finding top M of N • Can do this in O(N log M) using priority queue • Get largest M using min Priority Queue? 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 24

Details for M of N • Keep only M elements in the priority queue

Details for M of N • Keep only M elements in the priority queue • Every time one removed? It's the smallest • When done? Top M remain, removed smallest! • First element removed? Smallest, so … • Why is Linked. List used? O(1) add to front 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 25

Heap: PQ implementation • Binary tree stored in an array (not a search tree)

Heap: PQ implementation • Binary tree stored in an array (not a search tree) 6 10 7 17 13 9 21 19 25 • Conceptually a tree 0 1 2 3 4 5 6 7 8 9 10 • Actually stored in array 6 • Value/node at index k 7 10 • Left child: 2*k, Right: 2*k+1 17 13 9 21 • Root at index 1 25 19 • Finding parent? • Divide by 2 (remember / truncates) 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 26

Heap Concepts • Tree always has the heap shape • All leaves added left-to-right

Heap Concepts • Tree always has the heap shape • All leaves added left-to-right bottom row • Array elements are contiguous • Add new element? End of array… • Tree always has heap property • Node less than children 17 19 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 6 7 10 13 9 21 25 27

Thinking about Heaps • Where is the smallest element? • Heap shape or heap

Thinking about Heaps • Where is the smallest element? • Heap shape or heap property • Where is the biggest element? • Heap shape or heap property • How many leaves in heap of size N? 6 • Where are the leaves? 10 • Where is second smallest? 17 13 19 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 7 9 21 25 28

Adding to a Heap 6 • Must add at end of array 7 10

Adding to a Heap 6 • Must add at end of array 7 10 • Child in last row 13 17 9 21 • Violates heap property 19 25 insert 8 6 10 • Drag parent down as needed 13 17 19 25 8 • Aka bubble up 6 7 10 Bubble • When to stop? 8 17 9 21 19 25 13 21 8 up 7 8 17 19 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 9 6 • Why is this O(log N) 11/14/2018 7 10 9 21 25 13 29

Removing from Heap 6 • Min at root, remove it 7 10 13 17

Removing from Heap 6 • Min at root, remove it 7 10 13 17 9 • Replace with last 21 19 25 25 • Maintains shape 10 • Violates property 13 17 19 • Choose minimal child 7 • Pull up or bubble down 10 25 13 17 9 21 • When to stop? 19 7 9 21 7 9 10 • Why is this O(log N) 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 17 13 25 21 19 30

Array-based heap • We can build heap from N element in O(N) • Better

Array-based heap • We can build heap from N element in O(N) • Better than log(1) + log(2) + … + log(N) • That's log(1 x 2 x…x. N) or log(N!) or N log N • We can find minimal element in O(1) – index 1 • We can remove in O(log N) • Arrays are contiguous, parent/child *2 and /2 • These divisions are very, very fast, bit-shifts 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 31

What you should know? • Heap shape and heap property are key ideas •

What you should know? • Heap shape and heap property are key ideas • Adding or removing element O(log N) • Conceptual understanding is important • Java source: http: //bit. ly/javapq-source • See sift. Up and sift. Down for add/remove • Root at index 0: children 2*k+1 and 2*k+2 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 32

WOTO http: //bit. ly/201 fall 18 -nov 14 -2 11/14/2018 Compsci 201, Fall 2018,

WOTO http: //bit. ly/201 fall 18 -nov 14 -2 11/14/2018 Compsci 201, Fall 2018, Autocomplete + PQs + Heaps 33