IOEfficient Structures for Orthogonal Range Max and Stabbing
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang
Problem Definition: Range Max Queries • Range-aggregate queries: range-count, range-sum, range-max • N points in Rd • Each point p is associated with a weight w(p) • Query rectangle Q • Compute max{w(p) | p Q} • Static and dynamic 2
Problem Definition: Stabbing Max Queries • N hyper-rectangles in Rd • Each rectangle γ is associated with a weight w(γ) • Query point q • Compute max{w(γ) | q γ} 3
Model D Block I/O M • I/O Model – N : Elements in structure – B : Elements per block – M : Elements in main memory – n = N/B • Assumptions – M>B 2 – Each word holds log 2 N bits – Any coordinate or weight can be stored in one word P 4
Related Work & Our Results: Range Queries • 1 D range queries are easy: B-tree * O(n) space, O(log. Bn) query & update • 2 D range queries: – Poly-logarithmic query: CRB-tree [AAG 03] * O(nlog. Bn) space, O(log 2 Bn) query – Linear space: kd. B-tree, cross-tree, O-tree * query, O(log. Bn) update • Our results: 5
Related Work & Our Results: Stabbing Queries • 1 D stabbing queries – SB-tree [YW 01] * O(n) space, O(log. Bn) query & insert * Does not allow deletions! • 2 D stabbing queries – No structures with worst-case guarantee • Our results: 6
2 D Range Max Queries • The external version of Chazelle’s structure [C 88] – Linear space, – Static: O(log 1+εN) query – Dynamic: O(log 3 N log N) query & update • Overall structure – A normal B-tree Φ on y-coordinates of all the points – A Fan-out base B-tree T on x-coordinates * Pv: all points stored in the subtree of v * Each internal node v stores two secondary structures Cv, Mv storing information about Pv in a compressed manner * Cv and Mv of size O(|Pv| / log. Bn) → linear size in total * Weights of points stored at leaves explicitly 7
2 D Range Max Queries • Cv borrowed from CRB-tree – Compute the ranks of the points one level down in O(1) I/Os – Identify the weight of a point explicitly in O(log. Bn) I/Os v • Mv computes the maximum weight in a multislab in O(log. Bn) I/Os v 1 v 2 v 3 v 4 v 5 v 6 • Answering a query: – Use Φ to compute the ranks in the root of T – Use Mv to compute maximum at each level – For a total of O(log 2 Bn) I/Os 8
2 D Range Max Queries: Mv • Divide Pv into chunks of Blog. BN • Divide each chunk into minichunks of size B • Three-level structures v – Mv=(Ψ 1, Ψ 2, Ψ 3) – each of size O(|Pv| / log. Bn) 9
2 D Range Max Queries: Mv • Basic idea: encode the range max information in a compressed manner, identify the maximum point using Cv once its rank is found • Ψ 3[l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk – Find the rank of the maximum-weight point in O(1) I/Os; – Identify it in O(log. BN) I/Os. • Ψ 2[k]: for each chunk, encode a Cartesian tree on the O(log. BN) minichunks for each of the O(B) multislabs – Find the minichunk containing the maximum-weight point in O(1) I/Os; – Use Ψ 3 to find the exact point in O(log. BN) I/Os; • Ψ 1: A fanout B-tree on the O(|Pv| / (Blog. Bn)) chunks – Find the maximum-weight point in O(log. BN) I/Os. 10
2 D Range Max Queries • Static structures – O(n) size, O(log 2 BN) query, O(nlog. BN) construction – O(n) size, O(log. B 1+εN) query, O(Nlog. BN) construction • Dynamization: – Throw away Ψ 2 and expandΨ 3 – O(nlog. BN) size – O(log 3 BN) query, worst case – O(log 2 BN log. M/Blog. BN) insert, amortized – O(log 2 BN) delete, amortized • Extending to d-dimension – Standard technique – Pay an extra O(logd-2 BN) factor to all these bounds 11
1 D Stabbing Max Queries • Modify the external interval tree [AV 96] to support max • Fan-out base B-tree on x-coordinates – Interval stored in highest node v where it contains slab boundary – In one left (right) slab structure and the multislab structure • Answering a query – Search down tree and visit O(log. BN) nodes – Compute the maximum weight in left (right) slab structure and the multislab structure v 12
1 D Stabbing Max Queries • Slab structures are implemented using B-trees – Query and update: O(log. BN) I/Os • Multislab structure: Fan-out B-tree – At each internal node, we store the maximum weight for each of the slabs and for each of the children – Query: O(1) I/Os (only look at the root) – Update: O(log. BN) I/Os • Rebalancing the base tree: O(log. BN) I/Os – Weight-balanced B-trees • Overall cost: size O(n), query O(log 2 BN), update O(log. BN). 13
1 D Stabbing Max Queries • Space-time tradeoff: – O(nlog. BεN) size – O(nlog. B 2 -εN) query • Can handle the general semigroup queries – A semigroup (S, +) – Each weight w(γ) S – Want to compute ∑ q γ w(γ) • Ideas can also be used to improve the internal memory algorithm – Linear size, O(log 2 N / log N) query and update 14
2 D Stabbing Max Queries • Extend our 1 D stabbing query structure • Use our 2 D range query structure as a building block • Extending to d-dimension – Standard technique – Pay an extra O(logd-2 BN) factor to all these bounds 15
Conclusions and Open Problems • In this project, we developed I/O-efficient – linear space structures with poly-logarithmic query cost for the static 2 D range max queries – near linear space structures with poly-logarithmic query & update cost for the dynamic 2 D range max queries – linear space structures with poly-logarithmic query cost for the dynamic 1 D stabbing max queries – near linear space structures with poly-logarithmic query & update cost for the dynamic 2 D stabbing max queries • Open problems – Linear size dynamic structures for the 2 D range & stabbing max queries? – General semigroup queries? 16
THE END Thank you!
- Slides: 17