A GPUaccelerated Framework for Processing Trajectory Queries Bowen

A GPU-accelerated Framework for Processing Trajectory Queries Bowen Zhang, Yanyan Shen, Yanmin Zhu, Jiadi Yu Shanghai Jiao Tong University Contact: shenyy@sjtu. edu. cn

Queries on Large-scale Trajectory Data ▪ Increasing amount of trajectory data ▪ Trajectories are collected in every corner of the world ▪ Increasing demand of trajectory queries ▪ Route planning in hot navigation applications (e. g. Google Map) ▪ Analyze user behavior in bike-sharing systems (e. g. Mobike) ▪ Detour detection of taxis (e. g. Uber)

How to Handle Queries with High Throughput? ▪ Tons of trajectory queries need to be issued at the same time ▪ Detour detection: over 100, 000 taxis in Shanghai ▪ Route planning: millions of users waiting for recommended routes ▪ Need for high throughput query processing ▪ Key idea: leverage GPU to accelerate processing batch trajectory query ▪ GPUs are widely equipped on modern personal computers and servers ▪ Thousands of cores provide opportunities to run queries in parallel

Two Basic Types of Trajectory Queries ▪ Range query ▪ Find trajectories that go through the purple range ▪ Get two trajectories as the results ▪ Application: find taxis in a region ▪ Similarity query ▪ Find trajectories that are similar to the purple trajectory (Tq) ▪ Get the top-k results (most similar) ▪ Application: extract users with similar travel behavior

Trajectory Query Problem ▪ [1] L. Chen, M. T. O¨ zsu, and V. Oria, “Robust and fast similarity search for moving object trajectories, ” in SIGMOD, 2005, pp. 491– 502. .

Existing Trajectory Query Processing Methods ▪ Follow filtering-and-verification framework ▪ Designed for a special type of trajectory query ▪ Range query ▪ R-tree[2] and its varieties ▪ Quadtree[3], kd-tree[4] and its varieties: adaptive to skew distribution ▪ Similarity query ▪ Metrical similarity function: VP-tree[5], MVP-tree[6]… ▪ Other: ad-hoc index ▪ Existing GPU-based solutions are optimized for one type of query ▪ We ask the question: since both queries are important, how about developing a unified index that supports both range and similarity queries and is optimized for CPU-GPU hybrid environment? [2] A. Guttman, “R-trees: A dynamic index structure for spatial searching, ” in SIGMOD, 1984, pp. 47– 57. [3] Cudre-Mauroux, Philippe, Eugene Wu, and Samuel Madden. "Trajstore: An adaptive storage system for very large trajectory data sets. " Data Engineering (ICDE), 2010 IEEE 26 th International Conference on. IEEE, 2010. [4] H. Doraiswamy, H. T. Vo, C. T. Silva, and J. Freire, “A GPU-based index to support interactive spatio-temporal queries over historical data, ” in ICDE, 2016, pp. 1086– 1097. [5] Yianilos, Peter N. "Data structures and algorithms for nearest neighbor search in general metric spaces. " SODA. Vol. 93. No. 194. 1993. [6] Bozkaya, Tolga, and Meral Ozsoyoglu. "Distance-based indexing for high-dimensional metric spaces. " ACM SIGMOD Record. Vol. 26. No. 2. ACM, 1997.

An Example of GTIDX ▪ Suppose we have two trajectories… (Trajectory Table) (Quadtree-like index) ▪ Benefits ▪ Support pruning strategies for two types of trajectory queries ▪ Morton-based encoding permits coalesced access pattern while querying on trajectories

GPU-accelerated Range Query ▪ Range query processing ▪ Input: a region R ▪ Output: some trajectories that go through R ▪ Three main steps 01 1 113 ▪ Step 1: overlapping test between R and nodes on CPU ▪ BFS from root to leaf nodes: under morton-based encoding ▪ Each candidate leaf node has similar number of points to be verified ▪ These points are stored continuously because of the Morton-based encoding on the cell id -> coalesced access pattern 10 11 10 12 1101 13

GPU-accelerated Range Query ▪ Step 2: transfer points in candidate blocks into GPU memory ▪ MAT: a table recording existence of nodes in GPU with LRU swapping ▪ MAT reduces duplicated data transfer ▪ Step 3: verification on GPU ▪ Balanced task assignment to SMs ▪ Each SM handles similar blocks of points ▪ One SM contains tens of cores to verify points ▪ Output trajectories that have points in R ▪ Extension ▪ Multiple queries: send all verifications generated from all regions to GPUs ▪ Multiple GPUs: dispatch tasks to SMs on different GPUs

GPU-accelerated Top-k Similarity Query ▪ 1 2 3 Compute frequency distance Generate m candidate trajectories Compute real EDR distances (GPU) Update top-k results

GPU-accelerated Top-k Similarity Query ▪ GPU-accelerated EDR computation ▪ EDR definition ▪ A dynamic programming approach ▪ Parallelization based on GPU ▪ Each SM handle one computation ▪ Coalesced access pattern ▪ Linear executing time ▪ Top-k results ▪ Maintained by a priority queue during the iterations between candidate generation and EDR computation

Experimental Settings ▪ Datasets ▪ SHCAR ▪ Geo. Life ▪ Environment ▪ Tesla K 80 GPU with two chips, each has 2496 cores ▪ Two Intel 10 -core CPUs, Cent. OS 7 ▪ Comparison ▪ Range Query ▪ ▪ ▪ GAT-CPU-Range: multi-thread CPU version STIG[4]: recent work based on kd-tree on GPU FSG[7]: recent work based on a single grid on GPU GAT-no. G: no Grouping strategy GAT-no. M: no Morton-based encoding ▪ Similarity Query (the first work that uses GPU to accelerate EDR computation) ▪ GAT-CPU-Sim: multi-thread CPU version [4] H. Doraiswamy, H. T. Vo, C. T. Silva, and J. Freire, “A GPU-based index to support interactive spatio-temporal queries over historical data, ” in ICDE, 2016, pp. 1086– 1097. [7] J. Zhang, S. You, and L. Gruenwald, “High-performance spatial query processing on big taxi trip data using GPGPUs, ” in Big Data Congress, 2014, pp. 72– 79.

Performance Comparison Range Query Top-k Similarity Query

Speedup & Scalability Speedup Scalability

Indexing Cost ▪ On SHCAR Dataset Memory Cost Index Construction Time ▪ Acceptable extra cost ▪ Memory cost and index construction time

Conclusion ▪ A GPU-accelerated framework for two types of trajectory queries ▪ Candidate generation on CPU ▪ Accelerate verification by GPU ▪ GTIDX addresses two technical challenges ▪ Use grouping strategy based on the quadtree-like index to balance the workload among SMs ▪ Identify cells and nodes with Morton-based encoding to achieve coalesced access pattern and make full use of global memory bandwidth ▪ GAT achieves at least 2 x speedup compared with the adaptations of state-of-the-art methods ▪ For similarity query, we get near 5 x speedup on GPU than that on 20 cores CPU ▪ For range query, we get at least 2 x speedup than other methods based on GPU

Q&A Thank you!