Fast Trie Data Structures Seminar On Advanced Topics



























- Slides: 27

Fast Trie Data Structures Seminar On Advanced Topics In Data Structures Jacob Katz December 1, 2001 Dan E. Willard, 1981, “New Trie Data Structures Which Support Very Fast Search Operations”

Agenda • • • Problem statement Existing solutions and motivation for a new one P-Fast tries & their complexity Q-Fast tries & their complexity X-Fast tries & their complexity Y-Fast tries & their complexity JK 10/21/2021 2

Problem statement • Let S be a set of N records with distinct integer keys in range [0, M], with the following operations: – MEMBER(K) – does the key K belong to the set – SUCCESSOR(K) – find the least element which is greater than K – PREDECESSOR(K) – find the greatest element which is less than K – SUBSET(K 1, K 2) – produce a list of elements whose keys lie between K 1 and K 2 • The problem: efficient data structure supporting this definition JK 10/21/2021 3

Existing solutions • AVL trees, 2 -3 trees use O(N) space and O(log N) time in worst case – With no restriction on the keys better performance is impossible • Expected O(log N) time is possible when keys are uniformly distributed • Stratified trees use O(M * log M) space and O(log M) time in worst case for integer keys in range [0, M] – Disadvantage: O(M * log M) space is much larger when O(N), if M >> N JK 10/21/2021 4

Motivation for another solution • More space-efficient data structure is wanted for restricted keys, which still maintains the time efficiency… JK 10/21/2021 5

The way to the solution • We first define P-Fast Trie: – O( ) time; O(N * *2 ) space • Then show Q-Fast Trie – improvement to the space requirement to O(N) • Then show X-Fast Trie – O(log M) time; O(N*log M) space; no dynamic operations • Then show Y-Fast Trie – O(log M) time; O(N) space; no dynamic operations JK 10/21/2021 6

What’s Trie • Trie of size (h, b) is a tree of height h and branching factor b root 2 • All keys can be regarded as integers in range [0, bh] • Each key K can be represented as h-digit number in base b: K 1 K 2 K 3…Kh • Keys are stored in the leaf level; path from the root resembles decomposition of the keys to digits JK 10/21/2021 0 20 2 22 4 3 1 24 4 31 2 2 3 32 42 43 7

Trivial Trie • In each node store vector of branches – MEMBER(K) – O(h) • visits O(h) nodes, spends O(1) time in each – SUCCESSOR(K)/PREDECESSOR(K) – O(h*b) • visits O(h) nodes, spend O(b) time in each node • this is too much time • Observation: increasing b (the base of key representation, the branching factor) decreases h (number of digits required to represent a key, the height of the tree) and vice versa JK 10/21/2021 8

Example for worst case complexity root b-1 b-1 bh-1 JK 10/21/2021 9

P-Fast Trie Idea • Improve SUCCESSOR(k)/PREDECESSOR(k) time by overcoming the linear search in every intermediate node JK 10/21/2021 10

P-Fast Trie • Each internal node v has additional fields: – LOWKEY(v) – leaf node containing the smallest key descending from v – HIGHKEY(v) – leaf node containing the largest key descending from v – INNERTREE(v) – binary tree of worst-case height O(log b) representing the set of digits directly descending from v • Each leaf node points to its immediate neighbors on the left and on the right • CLOSEMATCH(K) – query returning the node with key K if it exists in the trie; returning PREDECESSOR(K) or SUCCESSOR(K) otherwise JK 10/21/2021 11

CLOSEMATCH(k) Algorithm Intuitively Starting from Root, look for k=k 1 k 2. . kh 1. a. b. 2. If found, return it If not, then v is the node at depth j from which there’s no way down any more: kj Ï INNERTREE(v) Looking for kj in INNERTREE(v), find D – existing digit in INNERTREE(v) that is either: a. the least digit greater than kj b. the greatest digit less than kj 3. JK 10/21/2021 If D > kj, then return LOWKEY(d’s child of v), else if D < kj, then return HIGHKEY(d’s child of v) 12

P-Fast Trie Complexities • CLOSEMATCH(K) time complexity is O(h + log b) • Other queries require O(1) addition to the CLOSEMATCH(K) complexity • Space complexity of such trie is O(h*b*N) • Representing the input keys in base 2 requires digits, therefore with such h and b the desired complexities are achieved JK 10/21/2021 13

Q-Fast Trie Idea • Improve space by splitting the set of keys into subsets • How to split is the problem: – To preserve the time complexity – To decrease the space complexity JK 10/21/2021 14

Q-Fast Trie • Let S’ denote the ordered list of keys from S: 0 = K 1 < K 2 < K 3 < … < K L < M • Define: Si = {K Î S | Ki £ Ki+1} for i < L SL = {K Î S | K ³ KL} • S’ is a c-partition of S iff each Si has cardinality in range [c, 2 c-1] • Q-Fast Trie of size (h, b, c) is a two-level structure: – Upper part: p-fast trie T of size (h, b) representing set S’ which is a cpartition of S – Lower part: forest of 2 -3 trees, where ith tree represents Si – The leafs of 2 -3 trees are connected to form an ordered list JK 10/21/2021 15

Example of Q-Fast Trie 0 10 17 33 JK 10/21/2021 35 35 71 70 77 81 95 99 16

CLOSEMATCH(k) Algorithm Intuitively 1. Look for D=PREDECESSOR(k) in the upper part • 1. O(h + log b) Then search the D’s 2 -3 tree for k • JK 10/21/2021 O(log c) 17

Q-Fast Trie Complexities • CLOSEMATCH(K) time complexity is O(h + log b + log c) • Other queries require O(1) addition to the CLOSEMATCH(K) complexity • Space complexity is O(N+N*h*b/c) • By choosing h = , b=2 desired complexities are achieved JK 10/21/2021 , c = h*b, the 18

P/Q-Fast Trie Insertion/Deletion • P-fast trie – Use AVL trees for INNERTREEs – O(h + log b) for insertion/deletion • Q-fast trie – O(h + log b + log c) for insertion/deletion – Maintenance of c-partition property through trees splitting/merging in O(log c) time JK 10/21/2021 19

X-Fast Trie Idea • P/Q-Fast trie uses top-down search to get to the wanted level, making binary search in each node on the way. • Thus, P/Q-Fast Trie relies on the balance between the height of the tree and the branching factor • X-Fast trie idea: Use binary search of the wanted level – Requires to be possible to find the wanted node by knowing its level without top-down pass – For the purpose of worst case complexity the branching factor is not important any more, since it only affects the basis of the log JK 10/21/2021 20

X-Fast Trie • Part 1: Trie of height h and branching factor 2 (representing all keys in binary) – Each node has additional field DESCENDANT(v): • If v has only right branch, it points to the largest leaf descending from v (thru the left branch) • If v has only left branch, it points to the smallest leaf descending from v (thru the right branch) – All leaves form doubly-linked list – Node v at height j may have descending leaves only in range [(i-1)*2 j+1, i*2 j] for some integer i; this i is called ID(v) – Node v at height j is called ancestor of key K, if K/2 j=ID(v) – BOTTOM(k) is the lowest ancestor of K JK 10/21/2021 21

X-Fast Trie • Part 2: h+1 Level Search Structures (LSS), each of which uses perfect hashing as we have seen in the first lecture: – Linear space & constant time JK 10/21/2021 22

BOTTOM(k) Algorithm Intuitively • Make binary search among the h+1 different LSSs – Searching each LSS is O(1) – h = log M, therefore binary search of h+1 LSSs is O(log M) JK 10/21/2021 23

X-Fast Trie Complexities • BOTTOM(k) is O(log M) • All queries require O(1) addition to BOTTOM(k), with assistance of the DESCENDANT field and the doublylinked list: – BOTTOM(K) is either K itself, or its DESCENDANT is PREDECESSOR(K)/SUCCESSOR(K) • Space is O(N * log M) – No more than h * N nodes in the trie (h=log M) – log M LSSs each using O(N) space JK 10/21/2021 24

Y-Fast Trie Idea • Apply similar partitioning technique, as done for P-Fast trie to move to Q-Fast trie: cpartitioning of all the keys to L subsets each containing [c, 2 c-1] keys • Upper part: X-Fast trie representing S’ • Lower part: forest of binary trees of height log c JK 10/21/2021 25

Y-Fast Trie Complexities • Upper part can be searched within O(log M) time and occupies no more than O((N/c) * log M) space • Each binary tree can be searched within O(log c) and they all together occupy O(N) space • Choosing c=log M: O(N) space; O(log M) time JK 10/21/2021 26

X/Y-Fast Trie Insertion/Deletion • LSSs have practically uncontrolled time complexity for dynamic operations – At least at the time the article was presented • Therefore, X/Y-Fast tries inherit this limitation JK 10/21/2021 27