Tries 262022 Tries 1 Outline and Reading Standard
- Slides: 22
Tries 2/6/2022 Tries 1
Outline and Reading Standard tries Compressed tries Suffix tries Huffman encoding tries 2/6/2022 Tries 2
Preprocessing Strings A trie is a compact data structure for representing a set of strings, such as all the words in a text They are useful for searching text quickly Can also be used for text compression, as we will see. 2/6/2022 Tries 3
Standard Trie (1) The standard trie for a set of strings S is an ordered tree such that: n n n Every node but the root is labeled with a character The children of a node are alphabetically ordered The paths from the root to the external nodes yield the strings of S Example: standard trie for the set of strings S = { bear, bell, bid, bull, buy, sell, stock, stop } 2/6/2022 Tries 4
Standard Trie (2) A standard trie uses O(n) space and supports searches, insertions and deletions in time O(dm), where: n total size of the strings in S m size of the string to be inserted d size of the alphabet 2/6/2022 Tries 5
Standard Trie (3) Why are searches O(dm)? Think about “ZZZZZZZ” At each level we scan the whole alphabet to find z. That is 7*26 comparisons! 2/6/2022 Tries 6
Word Matching with a Trie We insert the words of the text into a trie Each leaf stores the occurrences of the associated word in the text 2/6/2022 Tries 7
Compressed Trie A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of “redundant” nodes 2/6/2022 Tries 8
Compact Representation Compact representation of a compressed trie for an array of strings: n n n Nodes store ranges of indices instead of substrings Uses O(s) space, where s is the number of strings in the array Serves as an auxiliary index structure 2/6/2022 Tries 9
Compact Representation Each node stores the triple (i, l, k) where i is the string, l and k are indices into it, e. g. (6, 1, 2) mean S[6] from 1 to 2 or “id”. 2/6/2022 Tries 10
Suffix Trie (1) The suffix trie of a string X is the compressed trie of all the suffixes of X 2/6/2022 Tries 11
Suffix Trie (2) Compact representation of the suffix trie for a string X of size n from an alphabet of size d n n 2/6/2022 Uses O(n) space Supports arbitrary pattern matching queries in X in O(dm) time, where m is the size of the pattern Tries 12
Encoding Trie (1) A code is a mapping of each character of an alphabet to a binary code-word A prefix code is a binary code such that no code-word is the prefix of another code-word (like Morse code) An encoding trie represents a prefix code n n Each leaf stores a character The code word of a character is given by the path from the root to the leaf storing the character (0 for a left child and 1 for a right child) 00 011 10 11 a b c d e 2/6/2022 a Tries d b c e 13
Encoding Trie (2) Given a text string X, we want to find a prefix code for the characters of X that yields a small encoding for X Frequent characters should have short code-words Rare characters should have long code-words n n Example X = abracadabra T 1 encodes X into 29 bits T 2 encodes X into 24 bits n n n T 1 T 2 c 2/6/2022 d a r b a b c Tries r d 14
Huffman’s Algorithm Given a string X, Huffman’s algorithm construct a prefix code that minimizes the size of the encoding of X It runs in O(n + d log d) time, where n is the size of X and d is the number of distinct characters of X A heap-based priority queue is used as an auxiliary structure 2/6/2022 Tries 15
Huffman’s Algorithm, the idea. • Store each character of the string X in a Priority Queue using frequency of occurrence as the key. • Do 2 remove. Min operations. Combine the output into a tree by adding a root node which has the result of the first remove. Min as a left child and the result of the second as a right child. The root stores the combined frequencies as its key. • Repeat the previous step until all the data is in a single tree. 2/6/2022 Tries 16
Huffman’s Algorithm Huffman. Encoding(X) Input string X of size n Output optimal encoding trie for X C distinct. Characters(X) compute. Frequencies(C, X) Q new empty heap for all c C T new single-node tree storing c Q. insert(get. Frequency(c), T) while Q. size() > 1 f 1 Q. min. Key() T 1 Q. remove. Min() f 2 Q. min. Key() T 2 Q. remove. Min() T join(T 1, T 2) Q. insert(f 1 + f 2, T) return Q. remove. Min() 2/6/2022 Tries 17
Example 11 a 5 2 a b c d r 5 2 1 1 2 b 2 c 1 d 1 6 a X = abracadabra Frequencies c b 2 2/6/2022 c d b 2 r 2 a 5 c 2 d r 2 r 6 2 a 5 4 a 5 Tries c 4 d b r 18
Another Example (4, 5, 9, 10, 29) 4+5 9 0 1 4 5 = 0 1 4 5 E: 29 T: 10 N: 9 I: 5 S: 4 ~~~~~~~~~~~~~~~~ 2/6/2022 Tries 19
E: 29 T: 10 N: 9 I: 5 S: 4 Huffman Coding (9, 5, 9, 10, 29) 4, 5 9 4+5 9 0 1 4 5 = 0 1 4 5 ~~~~~~~~~~~~~~~~ (9, 9, 10, 29) 9, 9 18 9+9 0 1 9 9 = 18 0 1 9 9 0 1 4 5 ~~~~~~~~~~~~~~~~ (18, 10, 29) (10, 18, 29) 2/6/2022 Tries 20
= 28 10 + 18 0 1 10 18 0 1 9 9 0 1 4 5 ~~~~~~~~~~~~~~~~ 10, 18 28 (28, 29) 28+29 =57 0 28 2/6/2022 1 29 0 1 10 18 0 1 9 9 0 1 4 5 Tries 21
= 28 10 + 18 0 1 10 18 0 1 9 9 0 1 4 5 ~~~~~~~~~~~~~~~~ 10, 18 28 (28, 29) 28+29 =57 0 1 28 T S 2/6/2022 29 0 1 10 18 E 0 1 9 9 0 1 4 5 I N Tries E : 29 : 1 T : 10 : 00 N : 9 : 011 I : 5 : 0101 S : 4 : 0100 22
- While reading activities
- Standard tries
- Citation sandwich example
- Shear strain symbol
- Standard error of the mean
- Standard language examples
- Standard costing is
- Aims of teaching
- Types of reading skill
- What is extensive reading
- What is intensive reading
- What is intensive reading
- Properties of dfs
- Why does meherjan look more than her age
- Old opie occasionally tries
- Old opie occasionally tries
- Old opie occasionally tries trigonometry
- Evans tries an o level colin dexter
- Oh oh oh to touch and feel very good velvet
- Ali ahmed is a mathematics professor who tries to involve
- Tkam chapter 27-31 summary
- Evans tries an o level author
- 5 tries