RankSensitive Data Structures Iwona BialynickaBirula and Roberto Grossi
Rank-Sensitive Data Structures Iwona Bialynicka-Birula and Roberto Grossi (Università di Pisa) 12 th Symposium on String Processing and Information Retrieval (SPIRE 2005)
Overview • Motivation • Our results • Solution outline
Motivation • Output-sensitive data structures can still be too costly • Most often additional criteria exist • Examples • • • Web pages – Page. Rank or similar Geometrical objects – Z-order Various databases – physical location News items – time stamp Biological databases – biological relevance. . .
Rank-sensitive data structures • Output-sensitive data structures • Data structure holds n items • A query reports all (say l) matching items • Query time is O(t(n)+l) • Rank-sensitive data structures • • Additional, independent, a priori ranking function Additional parameter k at query time Query returns top k matching items sorted by rank Query time is O(t(n)+k)
Rank-sensitive – key features • Returns items sorted in rank order • Query time depends on k and not on l • Even if l is small with respect to n, it can still be much too large to process (e. g. web search) • Can be used in real-time systems • No structure which requires sorting has this feature
Overview • Motivation • Our results • Solution outline
Our model • Tree data structures • Result set is obtained from • An interval of consecutive leaves or • O(polylog(n)) such disjoint intervals
Our model – examples (1) • Suffix tree (trie) 0 1 0 occurrences of „ 010”
Our model – examples (2) • Range tree (1 D) 1 2 3 5 7 11 13 17 Items in range 3; 19 23 29 31
Our model – examples (3) • Range tree (2 D) Items in a 2 D rectangle
Our model – examples (4) • Hierarchy Similar items
Our results in this model • D – data structure as defined above • D – size of D (in memory words) • s(n) – number of items stored in D (incl. copies) • O(t(n)+l) – query time • D – rank-sensitive version of D (static case) • O(t(n)+k) – query time • D D + O(s(n)lg n) – for any 0 1 • D – rank-sensitive version of D (dynamic) • O(t(n)+k) plus O(lgn/lglgn) per interval • D D + O(s(n)lgn/lglgn) • O(lgn) per copy – rank change
Overview • Motivation • Our results • Solution outline
Basic idea 1 6 7 6 1 2 7 8 7 1 6 8 • O(nlgn) space 3 4 5 6 8 3 1 4 7 8 2 3 4 5 4 2 3 5 5 2
Query 1 6 7 6 1 2 7 8 7 1 6 8 3 4 5 6 8 3 1 4 7 8 2 3 4 5 4 2 3 5 5 2 • Reduced to merging O(lgn) lists on the fly
Space reduction in static case 1 6 1 7 6 0 1 0 2 1 7 0 8 1 7 0 1 6 8 • Chazelle 1988 • O(nlg n) space 3 1 4 1 5 1 6 0 8 0 3 1 1 4 7 0 8 0 2 1 3 0 4 0 5 1 4 0 2 1 3 5 5 0 2
Dynamic case • • Store explicit values in lists Weight-balanced B-tree Degree proportional to lgn/lglgn Dynamic fractional cascading • Multi-Q-heaps • Constant-depth hierarchical pipeline of heaps
Multi-Q-heaps • Similar to Q-heap • • Stores up to O(lg. N/lglg. N) integers The integers are from the universe 0. . . O(N) Search, find-min, insert, delete in O(1) time Requires lookup tables of O(N) space • Performs operations on any subset of items • Simplementation, no special instructions
Multi-Q-heaps in our solution (1) 3 2 1 Multi-Q . . . Multi-Q 11 ® lg. N lglg. N 2 6 1 Multi-Q 7 ® lg. N lglg. N 1 26 13 Multi-Q Constant depth 13 16 ® lg. N lglg. N O(lg. N )
Multi-Q-heaps in our solution (2) • Nodes have non-constant degree 3 Multi-Q 21 3 9 1 15
Thank you • Questions?
- Slides: 21