Optimization of Spatial Joins on Mobile Devices N
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems, University of Hong Kong 2 3 Department of Computer Science, National University of Singapore Department of Electrical and Electronic Engineering, University of Hong Kong
Motivation Ø Ø Ø Hotels Users are equipped with a mobile device (eg. PDA) Ad-hoc spatial queries Combine data from remote servers “Find hotels which are within 500 m of a seafood restaurant” Ø Ø Servers do not collaborate with each other The query is executed on the mobile device Restaurants
Cost Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time. Ø Goal: Minimize the amount of transferred data. Ø
Mediators? Hotels Restaurants Mediator Services may only allow end-user connections (eg. , subscribers only) Ø Access through mediators may be more expensive Ø Requests are ad-hoc; existing mediators may not support them Ø
Solution Integrate the statistics retrieval with the query processing phase Ø Ask aggregate queries to estimate the data distribution Ø Partition the space recursively to achieve sub-linear transfer cost Ø Choose the physical operator independently for each partition Ø
Related Work Ø Ø Ø Hash-based methods (eg. PBSM): require all data to be transferred R-tree based methods (eg. , [Tan et. al, TKDE, 2000]): require access to internal index Mediators : Ä HERMES : Statistics from previous queries Ä DISCO, Garlic : Statistics during initialization Ä Tuckila : Optimize parts of the execution tree
Operators WINDOW query: return all objects intersecting a window w Ø COUNT query: return the number of objects intersecting w Ø ε-RANGE query: return all objects within range ε from a point p Ø We do not have access to the internal indices!
Hash based spatial join Each partition must fit in memory
Recursive evaluation Retrieve statistics for each subpart
Nested loop spatial join Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES
Cost Model Ø TCP/IP: MTU = MSS + BH Ø c 1: download |RW| objects from R and |Sw| objects from S and join them on the PDA c 2: download |RW| objects from R, send them as window queries to S and retrieve the results c 4: repartition w, retrieve detailed statistics and apply the algorithm recursively Ø Ø
Mobi. Join algorithm Mobi. Join(w, |Rw|, |Sw|) if |Rw|=0 or |Sw|=0 then return compute c 1, c 2, c 3, c 4 cmin = min(c 1, c 2, c 3, c 4) if cmin = c 4 then impose a regular grid over w for each cell w’ in w retrieve |Rw’| and |Sw’| Mobi. Join(w’, |Rw’|, |Sw’|) else follow action specified by cmin
Iceberg Spatial Semi-Join SELECT H. id FROM Hotels H, Restaurants R WHERE dist(H. location, R. location) ≤ ε GROUP BY H. id HAVING COUNT(*) ≥ m
Experimental setup Ø Implementation Ä Server: Unix Ä Client: HP-Ipaq PDA (Wi. Fi network, 400 MHz RISC CPU, 64 MB RAM, Windows Pocket PC) Ø Datasets: Ä Synthetic: 1 K – 10 K points, varying skew Ä Real: Roads and railways of Germany Ø Algorithms: Ä NLSP: Only nested loop spatial join Ä HBSJ: Only hash-based spatial join
Varying the distance threshold ε PDA buffer = 5%
Varying the data skew Uniform data => Mobi. Join reduces to HBSJ
Varying the PDA’s buffer size Packets Bytes Large buffer => HBSJ fails to prune the empty areas
Iceberg queries Uniform data Skewed data Real dataset (35 K) joins a synthetic dataset (1 K)
Conclusions Ø Ø Ø Distributed spatial joins on mobile devices No mediator – non collaborative servers – limited set of supported operators Mobi. Join Ä Dynamically optimizes the entire process of statistics retrieval and query execution Ä Single ad-hoc query Ø Future work Ä Support multi-way spatial joins Ä Improve the accuracy of the cost model
- Slides: 19