Priority Queues and Heapsort 9 1 9 4

Priority Queue Implementations insert max sorted array n 1 unsorted array 1 n heap

Heaps n n How can we build a data structure to do this? Hints:

Insert n Implicit representation XTOGSMNAERAI (children of i at 2 i and 2 i+1)

Delete. Max n How would I delete X? n n Move last element to

Build. Heap (top down) n n Given an array, e. g. ASORTINGEXAMPLE, how do

Build. Heap (bottom up) Suppose we use the re. Heapify() function instead and work

Heapsort n n Build. Heap() for (i=1; i<=n; i++) Delete. Max(); Runtime? Q(n log

Priority Queue n n n Operations insert(), max(), delete. Max() Could implement with heap

Example Application n n n Suppose you have a text, abracadabra. Want to compress

Huffman Coding n n n Note: Put the letters at the leaves of a

Huffman coding example n abracadabra frequencies: n n Huffman code: n n a: 5,

Huffman coding summary n n n Huffman coding is very frequently used (You use

Finite Automata and Regular Expressions n How can I decode some Huffman-encoded text efficiently?

DFA for abracadabra n n Huffman code: A=0, B=100, C=1010, D=1011, E=11 DFA: state

Regular Expressions n Regular expressions are one of n n n n a literal

Finite Automata n n n Regular language: any language defined by a R. E.

RE NDFA n Given a Regular Expression, how can I build a DFA? Work

RE -> NDFA Ex n Construct an NDFA for the RE (A*B + AC)D

NDFA -> DFA n n n Keep track of the set of states you

Recognizing Regular Languages n n n Suppose your language is given by a DFA.

Examples n n Unix grep Perl $input =~ s/t[wo]? o/2; $input =~ s|<link[^>]*>s*||gs; $input

Slides: 22

Download presentation

Priority Queues and Heapsort (9. 1 -9. 4) n n Priority queues are used for many purposes, such as job scheduling, shortest path, file compression … Recall the definition of a Priority Queue: n n n operations insert(), delete_max() also max(), change_priority(), join() How would I sort a list, using a priority queue? for (i=0; i<n; i++) insert(A[i]); for (i=0; i<n; i++) cout << delete_max(); n How would I implement a priority queue? n n how fast a sorting alg would your implementation yield? can we do better?

Priority Queue Implementations insert max sorted array n 1 unsorted array 1 n heap lg n binomial queue lg n (best) 1 lg n delete change priority join n 1 lg n n 1 lg n 1 n n n lg n 1

Heaps n n How can we build a data structure to do this? Hints: n n Heap: n n n we want to find the smallest element quickly we want to be able to remove an element quickly Tree of some sort? a full binary tree (all leaves at the same level, on left) each element is at least as large as its children (note: this is not a BST!) How to delete the maximum? How to add a number to a heap? How to build a heap out of a list of numbers?

Insert n Implicit representation XTOGSMNAERAI (children of i at 2 i and 2 i+1) n How would I do an insert()? n n add to the end of the array repeat: if larger than parent, swap XTOGSMNAERIP XTOGSPNAERIM XTPGSONAERIM template <class Item> void insert(Item a[], Item new. Item, int items) { n = ++items; a[n] = new. Item; Runtime? while (n>1 && a[n/2] < a[n]) { exch(a[n], a[n/2]); n/=2; } Q(log n) }

Delete. Max n How would I delete X? n n Move last element to root If larger than either child, swap with larger child Item Delete. Max(Item a[], int items) { exch(a[1], a[items--]); re. Heapify(a, items); return a[items+1]; } void re. Heapify(Item a[], int items) { int n=1; while (2*n <= items) { int j = 2*n; if (j<items && a[j] < a[j+1]) j++; if (a[n] >= a[j]) break; exch(a[n], a[j]); n=j; } } XTOGSMNAERAI ITOGSMNAERAX TIOGSMNAERAX TSOGIMNAERAX TSOGRMNAEIAX Runtime? Q(log n)

Build. Heap (top down) n n Given an array, e. g. ASORTINGEXAMPLE, how do I make it a heap? Top-down: n n n Runtime: n n for (i=2; i<=items; i++) insert(a, a[i], i-1) Q(n log n) Can we do better?

Build. Heap (bottom up) Suppose we use the re. Heapify() function instead and work bottom-up. ASORTINGEX n For (i=items/2; i>=1; i--) ASORXINGET n re. Heapify(a) AXORSINGET AXORTINGES Runtime? XAORTINGES 1+1+…+2+2+. . +4+. . n/4 + 2(n/8) + 3(n/16) + 4(n/32) + … XTORAINGES XTORSINGEA n(1/4 + 2/8 + 3/16 + 4/32 + …) n n*1 Q(n) ! Top-down was Q(n log n); bottom up is Q(n)! cool!

Heapsort n n Build. Heap() for (i=1; i<=n; i++) Delete. Max(); Runtime? Q(n log n) Almost competitive with quicksort

Priority Queue n n n Operations insert(), max(), delete. Max() Could implement with heap Runtime for each operation? n n insert(), delete. Max() – O(log n) max() – O(1)

Example Application n n n Suppose you have a text, abracadabra. Want to compress it. How many bits required? at 3 bits per letter, 33 bits. Can we do better? How about variable length codes? In order to be able to decode the file again, we would need a prefix code: no code is the prefix of another. How do we make a prefix code that compresses the text?

Huffman Coding n n n Note: Put the letters at the leaves of a binary tree. Left=0, Right=1. Voila! A prefix code. Huffman coding: an optimal prefix code Algorithm: use a priority queue. insert all letters according to frequency if there is only one tree left, done. else, a=delete. Min(); b=delete. Min(); make tree t out of a and b with weight a. weight() + b. weight(); insert(t)

Huffman coding example n abracadabra frequencies: n n Huffman code: n n a: 5, b: 2, c: 1, d: 1, r: 2 a: 0, b: 100, c: 1010, d: 1011, r: 11 bits: 5 * 1 + 2 * 3 + 1 * 4 + 2 * 2 = 23 Finite automaton to decode – Q(n) Time to encode? n n n Compute frequencies – O(n) Build heap – O(1) assuming alphabet has constant size Encode – O(n)

Huffman coding summary n n n Huffman coding is very frequently used (You use it every time you watch HTDV or listen to mp 3, for example) Text files often compress to 60% of original size In real life, Huffman coding is usually used in conjunction with a modeling algorithm… E. g. jpeg compression: DCT, quantization, and Huffman coding Text compression: dictionary + Huffman coding

Finite Automata and Regular Expressions n How can I decode some Huffman-encoded text efficiently? (hand-design a dfa to recognize) n n n Or: how can I find all instances of aardvark, aaaardvark, etc. or zyzzyva, zyzzzzyva, etc. in Microsoft Word? Unix? (grep) All words with 2 or more As or Zs? Important topic: regular expressions and finite automata. n n theoretician: regular expressions are grammars that define regular languages programmer: compact patterns for matching and replacing

DFA for abracadabra n n Huffman code: A=0, B=100, C=1010, D=1011, E=11 DFA: state 0 0 1 1 2 2 3 3 n 0 1 0 1 read out new state 0 A 0 1 1 2 R 0 B 0 3 C 0 D 0 (Actually, this looks just like the original tree, doesn’t it. )

Regular Expressions n Regular expressions are one of n n n n a literal character a (regular expression) – in parentheses a concatenation of two R. E. s the alternation (“or”) of two R. E. s, denoted + the closure of an R. E. , denoted * (i. e 0 or more occurrences) Possibly additional syntactic sugar Examples abracadabra(cadabra)* = {abra, abracadabracadabra, … } (a*b + ac)d (a(a+b)b*)* t(w+o)? o [? means 0 or 1 occurrences] aa+rdvark [+ means 1 or more occurrences]

Finite Automata n n n Regular language: any language defined by a R. E. Finite automata: machines that recognize regular languages. Deterministic Finite Automaton (DFA): n n n a set of states including a start state and one or more accepting states a transition function: given current state and input letter, what’s the new state? Non-deterministic Finite Automaton (NDFA): n like a DFA, but there may be n n more than one transition out of a state on the same letter (Pick the right one non-deterministically, i. e. via lucky guess!) epsilon-transitions, i. e optional transitions on no input letter

RE NDFA n Given a Regular Expression, how can I build a DFA? Work bottom up. Letter: n Concatenation: n Or: n n Closure:

RE -> NDFA Ex n Construct an NDFA for the RE (A*B + AC)D A A* A*B + AC (A*B + AC)D

NDFA -> DFA n n n Keep track of the set of states you are in. On each new input letter, compute the new set of states you could be in. The set of states for the DFA is the power set of the NDFA states. n I. e. up to 2 n states, where there were n in the DFA.

Recognizing Regular Languages n n n Suppose your language is given by a DFA. How to recognize? Build a table. One row for every (state, input letter) pair. Give resulting state. For each letter of input string, compute new state When done, check whether the last state is an accepting state. Runtime? O(n), where n is the number of input letters Another approach: use a C program to simulate NDFA with backtracking. Less space, more time. (egrep vs. fgrep? )

Examples n n Unix grep Perl $input =~ s/t[wo]? o/2; $input =~ s|<link[^>]*>s*||gs; $input =~ s|s*@font-faces*{. *? }||gs; $input =~ s|s*mso-[^>"]*"|"|gis; $input =~ s/([^ ]+) +([^ ]+)/$2 $1/; $input =~ m/^[0 -9]+. ? [0 -9]*|. [0 -9]+$/; ($word 1, $word 2, $rest) = ($foo =~ m/^ *([^ ]+) +(. *)$/); $input=~s|<span[^>]*>s*<brs+clear="? all[^>]*>s*</sp an>|<br clear="all"/>|gis;