Compressed Rank Select on general strings Paolo Ferragina
- Slides: 14
Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di Pisa
Generalised Rank and Select n Rank(c, i) = #c in L[1, i] n Select(c, i) = position of the i-th c in L L = a b a a a c b c d a b e c d. . . Select( a , Rank( 2 ) = a 3 , 7 ) = 4 Paolo Ferragina, Università di Pisa
Generalised Rank and Select R If S is small (i. e. constant) v Build binary Rank data structure for each symbol of S ü Rank takes O(1) time and small space R If S is large (words ? ) v Need a smarter solution: Wavelet Tree data structure Algorithmic reduction: >> Reduce Rank&Select over arbitrary strings. . . to Rank&Select over binary strings Paolo Ferragina, Università di Pisa
The Wavelet Tree abracadabra (Alphabetic ? ) Tree a c d b Paolo Ferragina, Università di Pisa r
The Wavelet Tree abracadabra aacaaa brdbr brbr a ? c d aaaaa d b bb Paolo Ferragina, Università di Pisa ? ? r rr ?
The Wavelet Tree abracadabra 01100010110 brdbr 00100 aacaaa 001000 brbr 0101 a c d b Paolo Ferragina, Università di Pisa r In any case, O(|S| log |S|) bits. Easier Alphabetic order + Heap structure Fact. Given the tree and the binary strings, we can recover the original string !!
The Wavelet Tree Reduce to right symbols Rank(b, 8) abracadabra 01100010110 aacaaa 001000 a It’s binary c d b Paolo Ferragina, Università di Pisa brdbr 00100 Rank(b, 2) brbr 0101 Rank(b, 3) Reduce to left symbols r Every step can be turned to binary
The Wavelet Tree Rank(b, 8) abracadabra 01100010110 aacaaa 001000 Select is similar Rank 1(8)=3 Rank 0(2) = 2 – Rank 1(1)= 1 brdbr 00100 brbr 0101 a c d b Paolo Ferragina, Università di Pisa r Right move = Rank 1 Rank 0(3) = 3 – Rank 1(3)= 2 Left move = Rank 0 Generalised R&S implemented with log |S| binary R&S
Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di Pisa
Standard representation Binary tree: each node has two pointers to its left and right children An n-node tree takes 2 n pointers or 2 n lg n bits. x x x x x Supports finding left child or right child of a node (in constant time). For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each.
Can we improve the space bound? n There are less than 22 n distinct binary trees on n nodes. n 2 n bits are enough to distinguish between any two different binary trees. n Can we represent an n node binary tree using 2 n bits?
Binary tree representation n A binary tree on n nodes can be represented using 2 n+o(n) bits to support: n n n parent left child right child in constant time.
Heap-like notation for a binary tree 1 Add external nodes 1 Label internal nodes with a 1 and external nodes with a 0 Write the labels in level order 11110110100100000 1 1 0 1 0 0 0 One can reconstruct the tree from this sequence An n node binary tree can be represented in 2 n+1 bits. What about the operations? 1 01 0 0 0
Heap-like notation for a binary tree 1 x x: # 1’s up to x (Rank) 1 2 x x: position of x-th 1 (Select) 2 4 left child(x) = On green(2 x) 5 9 right child(x) = On green(2 x+1) 7 10 5 11 14 7 12 7 parent(x) = On red (� x/2� ) 5 6 3 6 4 8 1 2 3 4 3 6 13 8 15 8 1 1 0 0 1 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
- Paolo ferragina
- Paolo ferragina
- Paolo ferragina
- Paolo ferragina
- Select * from select
- Select * from select
- Select * from select
- Nameadmin
- Vortex strings
- A type of cipher that uses multiple alphabetic strings
- Springs and strings
- Ottawa suzuki strings
- C array of structs
- Concatenation discrete math
- Balingbing musical classification