Compressed Rank Select on general strings Paolo Ferragina

  • Slides: 14
Download presentation
Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di

Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di Pisa

Generalised Rank and Select n Rank(c, i) = #c in L[1, i] n Select(c,

Generalised Rank and Select n Rank(c, i) = #c in L[1, i] n Select(c, i) = position of the i-th c in L L = a b a a a c b c d a b e c d. . . Select( a , Rank( 2 ) = a 3 , 7 ) = 4 Paolo Ferragina, Università di Pisa

Generalised Rank and Select R If S is small (i. e. constant) v Build

Generalised Rank and Select R If S is small (i. e. constant) v Build binary Rank data structure for each symbol of S ü Rank takes O(1) time and small space R If S is large (words ? ) v Need a smarter solution: Wavelet Tree data structure Algorithmic reduction: >> Reduce Rank&Select over arbitrary strings. . . to Rank&Select over binary strings Paolo Ferragina, Università di Pisa

The Wavelet Tree abracadabra (Alphabetic ? ) Tree a c d b Paolo Ferragina,

The Wavelet Tree abracadabra (Alphabetic ? ) Tree a c d b Paolo Ferragina, Università di Pisa r

The Wavelet Tree abracadabra aacaaa brdbr brbr a ? c d aaaaa d b

The Wavelet Tree abracadabra aacaaa brdbr brbr a ? c d aaaaa d b bb Paolo Ferragina, Università di Pisa ? ? r rr ?

The Wavelet Tree abracadabra 01100010110 brdbr 00100 aacaaa 001000 brbr 0101 a c d

The Wavelet Tree abracadabra 01100010110 brdbr 00100 aacaaa 001000 brbr 0101 a c d b Paolo Ferragina, Università di Pisa r In any case, O(|S| log |S|) bits. Easier Alphabetic order + Heap structure Fact. Given the tree and the binary strings, we can recover the original string !!

The Wavelet Tree Reduce to right symbols Rank(b, 8) abracadabra 01100010110 aacaaa 001000 a

The Wavelet Tree Reduce to right symbols Rank(b, 8) abracadabra 01100010110 aacaaa 001000 a It’s binary c d b Paolo Ferragina, Università di Pisa brdbr 00100 Rank(b, 2) brbr 0101 Rank(b, 3) Reduce to left symbols r Every step can be turned to binary

The Wavelet Tree Rank(b, 8) abracadabra 01100010110 aacaaa 001000 Select is similar Rank 1(8)=3

The Wavelet Tree Rank(b, 8) abracadabra 01100010110 aacaaa 001000 Select is similar Rank 1(8)=3 Rank 0(2) = 2 – Rank 1(1)= 1 brdbr 00100 brbr 0101 a c d b Paolo Ferragina, Università di Pisa r Right move = Rank 1 Rank 0(3) = 3 – Rank 1(3)= 2 Left move = Rank 0 Generalised R&S implemented with log |S| binary R&S

Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di

Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di Pisa

Standard representation Binary tree: each node has two pointers to its left and right

Standard representation Binary tree: each node has two pointers to its left and right children An n-node tree takes 2 n pointers or 2 n lg n bits. x x x x x Supports finding left child or right child of a node (in constant time). For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each.

Can we improve the space bound? n There are less than 22 n distinct

Can we improve the space bound? n There are less than 22 n distinct binary trees on n nodes. n 2 n bits are enough to distinguish between any two different binary trees. n Can we represent an n node binary tree using 2 n bits?

Binary tree representation n A binary tree on n nodes can be represented using

Binary tree representation n A binary tree on n nodes can be represented using 2 n+o(n) bits to support: n n n parent left child right child in constant time.

Heap-like notation for a binary tree 1 Add external nodes 1 Label internal nodes

Heap-like notation for a binary tree 1 Add external nodes 1 Label internal nodes with a 1 and external nodes with a 0 Write the labels in level order 11110110100100000 1 1 0 1 0 0 0 One can reconstruct the tree from this sequence An n node binary tree can be represented in 2 n+1 bits. What about the operations? 1 01 0 0 0

Heap-like notation for a binary tree 1 x x: # 1’s up to x

Heap-like notation for a binary tree 1 x x: # 1’s up to x (Rank) 1 2 x x: position of x-th 1 (Select) 2 4 left child(x) = On green(2 x) 5 9 right child(x) = On green(2 x+1) 7 10 5 11 14 7 12 7 parent(x) = On red (� x/2� ) 5 6 3 6 4 8 1 2 3 4 3 6 13 8 15 8 1 1 0 0 1 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17