Data Structures Algorithms Radix Search Richard Newman based
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick
Radix-based Keys • Key has multiple parts • Each part is an element of some set • • • Character • Numeral Key parts can be accessed (e. g. , string s[i]) Size of set is radix
Advantages of Radix-based Search • Good worst-case performance • Simpler than balanced trees, etc. • Fast access to data • Easy way to handle variable-length keys • Save space (part of key in structure)
Disadvantages of Radix-based Search • May be space-inefficient • Performance depends on access to bytes of keys • Must have distinct keys, or other way to handle duplicate keys
Digital Search Trees • Similar to binary search trees • Difference is that we use bits of the key to determine subtree to search • Path in tree = prefix of key
Digital Search Trees • Insert A-S-E-R-C-H-I-N-G Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 I 01001 N 01110 G 00111 0 C 0 E 0 1 1 0 0 G 10 I 1 H 1 0 A 1 0 N R S 0 1 1 1 Note that binary tree is not sorted in BST sense
Digital Search Trees Prop 15. 1: A search or insertion into a DST takes about lg N comparisons on average, and about 2 lg N comparisons in the worst case, in a tree built from N keys. The number of comparisons is never more than the number of bits in the search key.
Tries • Use bits of key to guide search like DST • But keep keys in order like BST • Allow recursive sort, etc. • Pronounced “try-ee” or “try” • Keys kept at leaves of a binary tree
Tries • Defn. 15. 1: A trie is a binary tree that has keys associated with each leaf, defined as follows: a trie for an empty set is a null link a trie for a single key is a leaf w/key a trie for > 1 key is an internal node with left link referring to trie for keys that start with 0, right for keys 1 xxx
Tries • Insert A-S-E-R-C-H-I-N-G Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 I 01001 N 01110 G 00111 A 0 A 1 S Construct tree to point where prefixes match
Tries • Insert A-S-E-R-C-H-I-N-G Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 I 01001 N 01110 G 00111 0 A A 1 S 0 1 0 1 A E 0 1 R 0 1 S Construct tree to point where prefixes match
Tries • Insert A-S-E-R-C-H-I-N-G Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 I 01001 N 01110 G 00111 0 0 1 A 0 1 H E C 1 0 1 R 0 1 S
Tries • Insert A-S-E-R-C-H-I-NG Key Repr A S E R C H I N G 00001 10011 00101 10010 00011 01000 01001 01110 00111 0 0 1 A C 0 1 H 0 1 0 1 E 1 01 H 0 1 I R 0 1 S
Tries • Prop. 15. 2: The structure of a trie is independent of key insertion order; there is one unique trie for any given set of distinct keys. • Prop. 15. 3: Insertion or search for a random key in a trie built from N random keys takes about lg N bit comparisons on average, in the worst case, bounded by bits in key
Tries • Annoying feature of tries: • One-way branching when keys have common prefix • Prop. 15. 4: A trie built from N random w-bit keys has about N/lg 2 nodes on the average (about 1. 44 N)
Patricia Tries • Annoying feature of tries: • One-way branching when keys have common prefix • Two different types of nodes in trie • Patricia tries: fix both of these • Practical Algorithm To Retrieve Information Coded In Alphanumeric
Patricia Tries • Avoid one-way branching: • Keep at each node the index of the next bit to test • Skip over common prefix! • Avoid two types of nodes: • Store data in internal nodes • Replace external links with back links
Patricia Tries Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 I 01001 N 01110 G 00111 0 1 2 E 3 C 4 A H S 4 R
Patricia Tries • Prop 15. 5: Insertion or search in a patricia trie built from N random bitstrings takes about lg N bit comparisons on average, and about 2 lg N in the worst case, but never more than the length of the key.
Map • Radix search • Digital Search Trees • Tries • Patricia Tries • Multiway tries and TSTs • Text string algorithms
Multiway Tries • Like radix sort, can get benefit from comparing more than one bit at a time • Compare r bits, speed up search by a factor of r • What could possibly be bad? • Number of links is now R=2 r • Can waste a lot of space!
Multiway Tries • Structure is (almost) the same as binary tries • Except there are R branches • Search: start at root, leftmost digit • Follow ith link if next R-ary digit is i • If null link, then miss • If reach leaf, it contains only key with prefix matching path to it - compare
Existence Tries • Only keys, no records • Insert/search • Defn. 15. 2: The existence trie for a set of keys is: • Empty set: null link • Non-empty set: internal node with links for each possible digit to tries built with the leading digit omitted
Existence Tries • Convenient to return null on miss, dummy record on hit • Convenient to have no duplicate keys and no key a prefix of another key • Keys of fixed length, or • Use termination character with value NULLdigit, only used as sentinel
Existence Tries • No need to store any data • All keys captured in trie structure • If reach NULLdigit at the same time we run out of key digits, search hit • Otherwise, search miss • Insert: search until find null link, then add nodes for each of the remaining digits in the key
Existence Tries the time is now for f i o s r t n h o w i m e e
Multi-way Tries • R-ary branching • Keys stored at leaves • Path to leaf defines prefix of key stored at leaf • Only build tree downward until prefixes become distinct
Multi-way Tries • Defn. 15. 3: The multiway trie for a set of keys associated with leaves is: • Set empty: null link • Singleton set: leaf with key • Larger set: internal node with links for each possible digit to tries built with the leading digit omitted
Multi-way Tries • Def
Summary • Radix search • Digital Search Trees • Tries • Patricia Tries • Multiway tries and TSTs • Text string algorithms
- Slides: 30