Extracting Mobility Statistics from Indexed SpatioTemporal Datasets Yoshiharu

Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki Kitagawa University of Tsukuba August 30, 2004 STDBM 2004 at Toronto

Outline l l Background and objectives Markov transition probability Indexing method for moving trajectories Proposed methods l l naïve algorithm CSP-based algorithm Experimental results Conclusions

Background l Moving object databases l l l Research issues l l l stores and manages information on a huge number of moving objects supports queries on moving trajectories and/or moving status spatio-temporal indexes extraction of statistics (e. g. , selectivities) Statics in spatio-temporal databases l l used for query optimization also useful in mobility analysis

Our Approach l l l Objective: extracting mobility statistics from spatiotemporal databases Target: trajectory data indexed using R-trees Statistics to be extracted：Markov transition probability l l l target space is decomposed in cells estimating transition probabilities between cells using the indexed trajectory data Features l l search problem is formalized as constraint satisfaction problem (CSP) efficient processing using R-trees

Outline l l Background and objectives Markov transition probability Indexing method for moving trajectories Proposed methods l l naïve algorithm CSP-based algorithm Experimental results Conclusions

Markov Transition Probability (1) l l Assumption: target space is decomposed in cells Example 1: What is the estimated probability that an object currently in cell c 0 moves in cell c 1 in a unit time later? c 0 c 1 A t =τ l A t =τ+1 First-order Markov transition probability Pr(c 1|c 0)

Markov Transition Probability (2) l Example 2: What is the probability that an object which moves from c 0 to cell c 1 in a unit time moves to cell c 2 in the next unit time? c 0 c 1 A A c 2 A t =τ l l t =τ+1 t =τ+2 Second-order transition probability Pr(c 2|c 0, c 1) Extension to order-n Markov transition probability Pr(cn|c 0, …, cn-1) is easy

Markov Transition Probability l Conventional technique in traffic data analysis l l Special kind of association rules l l l Upton & Fingleton, 1989 [13] probability corresponds to the confidence factor difference: existence of order Usage l trajectory estimation l l estimates where a moving object moves to in the next period simulation of movement status l given status of moving objects at t = , we can estimate the change of the status at t = + 1, + 2, …

Assumptions l Movement patterns obeys stationary process l l Cell decomposition l l movement tendency does not change as time passes each cell is a rectangle cell size is arbitrary: non-uniform decomposition is allowed cell decomposition can be specified dynamically Unit time length l l unit time can be specified as arbitrary length (e. g. , one minuite, 10 minuites, …) but a unit time length should be a multiple of sampling time length

Formalization of Probability (1) l l Target data: trajectory data from t = 0 to t = T Definition of first-order Markov transition probability l l l objs(ci, t): set of objects which were in cell ci at t denominator: no. of objects which were in cell c 0 at arbitrary t (0 ≤ t ≤ T 1) numerator: no. of objects each of which contained in denominator and moved cell c 1 at t + 1

Formalization of Probability (2) l Definition of order-n Markov Transition Probability l l denominator: no. of objects each of which was in cell c 0 at t (0 ≤ t ≤ T 1), in cell c 1 at t + 1, …, and in cell cn 1 at t + n 1 numerator: no. of objects each of which is contained in Dominator and moved cell cn at t + n

Generalized Transition Probability Estimation Problem (1) Given n + 1 cell sets for each of arbitrary cell combinations output Pr(cn|c 0, …, cn-1) l Derives transition probability according to the specified cell sets at once

Generalized Transition Probability Estimation Problem (2) l Example: Given C 0 = {c 0, c 1}, C 1 = {c 1, c 2}, C 2 = {c 1, c 2, c 3}, estimate second-order probabilities c 0 c 1 c 2 c 3 l Algorithm outputs 12 probabilities Pr(c 1|c 0, c 1), Pr(c 2|c 0, c 1), …, Pr(c 3|c 1, c 2)

Outline l l Background and objectives Markov transition probability Indexing method for moving trajectories Proposed methods l l naïve algorithm CSP-based algorithm Experimental results Conclusions

Indexing Methods for Trajectories l l R-tree-based approach is assumed Point-based representation: trajectories is represented as a set of points l l (d+1)-dimension R-tree is used (e. g. , 3 D R-tree) incorporating temporal dimension

(d +1)-D R-tree-based Representation x x root b １５６３ B a ４ c ２０　 1 A ０　 1 2 3 4 5 6 2 3 7 8 (=T) 4 5 6 7 root a Sampling-based representation 1 2 b 3 c 4 5 6 8 (=T)

Outline l l Background and objectives Markov transition probability Indexing method for moving trajectory data Proposed methods l l naïve algorithm CSP-based algorithm Experimental results Conclusions

Naïve Algorithm (1) l l Based on the definition of the Markov transition probability Example: Estimating Pr(c 2|c 0, c 1) l l l l Determine objs(c 0, ) and objs(c 1, + 1) using the R-tree l objs(ci, t): the set of objects which were in cell ci at time t Take intersection of two sets; the cardinality of the intersection is added to Scount If the intersection is not empty objs(c 2, + 2) is determined using the R-tree Take intersection of objs(c 0, ), objs(c 1, + 1) , objs(c 2, + 2); the cardinality of the result is added to Qcount This process is repeated for each (0 ≤ ≤ T – n) Calculate Pr(c 2|c 0, c 1) based on Scount, Qcount No. of search on R-tree is proportional to T

Naïve Algorithm (2) Example: estimation of x Output = 　 Qcount 　　　Scount Qcount += 1 cell c 2 cell c 1 cell c 0 ０　 1 2 3 4 Scount += 1 5 6 7 Scount += 1 8 (=T) No. of search on R-tree is proportional to T

Outline l l Background and objectives Markov transition probability Indexing method for moving trajectories Proposed methods l l naïve algorithm CSP-based algorithm Experimental results Conclusions

Basic Idea (1) l Estimation of Pr(cn|c 0, …, cn-1) based on three steps: 1. 2. 3. l Count the no. of objects which were in c 0, …, cn-1 at each unit time using an R-tree Count the no. of objects which were in c 0, …, cn　at each unit time using an R-tree Compute Pr(cn|c 0, …, cn-1) by [result of step 2] / [result of step 1] Benefits l step 1 & 2 can be processed using the same algorithm l l algorithm for step 1 is given by setting n → n – 1 requires only two searches on R-tree

Basic Idea (2) x Example: estimation of Pr(c 2|c 0, c 1) Step 1: count objects which moved from c 0 to c 1 within a cell unit time c 2 Step 2: count objects that moved as cell c , c at each c 1 0 1 2 unit time cell Step 3: compute c 0 probability Qcount = 1 Pr(c 2|c 0, c 1) = ――――― Scount = 2 ０　 1 2 3 4 5 6 7 8 (= T )

Counting Using R-tree (1) l l l How can we compute no. of objects which were in c 0, …, cn at each unit time? Idea: the problem is formalized as a constraint satisfaction problem (CSP) An object satisfying the constraint fulfills the following constraints for some l l l it was in cell c 0 at t = it was in cell c 1 at t = + 1 … it was in cell cn at t = + n Search objects that satisfy all n + 1 constraints

Counting Using R-tree (2) l Effective use of R-tree is necessary l We extend the CSP solution search method using R-trees (Papadias et al, VLDB’ 98) [7] l considers spatial constraints l l search CSP solutions from the root to leaves l l l Example: find all spatial objects x, y, z that satisfy overlap(x, y) and north(y, z) Use of pruning and backtracks Reduce search space using constraints enumerates all solutions with one R-tree access

Example of Counting (1) x root For C 0 = {c 1}, C 1 = {c 1, c 2}, C 2={c 2}, derive probabilities for (C 0, C 1, C 2) b １５６３ c２ a ４ c c１ 3 4　 5 6 7 (=T) Pr(c 2|c 1, c 1): the probability that an object which have moved as c 1 next moves to c 2 l Pr(c 2|c 1, c 2) l ２０　　1 　 2 Derive two probabilities at once 8

Example of Counting (2) x root R-tree b １ root ５６３ c 2 a ４ c 1 3 4　 5 b c c ２０　　1 　 2 a 6 7 (=T) 8 1 2 3 4 5 6

Pruning Method (1) Pruning condition 1: Movement between two R-tree nodes which do not temporary consecutive is impossible x b c a Candidates can be deleted ０　 1 2 3 4 5 6 7 8 (=T) Example: - movement such as a b and b c are allowed - movement a c is impossible

Pruning Method (2) x Pruning condition 2: Trajectory is not contained in the target cell c 1 ０　 1 2 3 4 5 6 7 8 (=T) Example: When we are counting for c 1, we should consider only nodes that overlaps with c 1

Pruning Method (3) x Pruning condition 3: If [max distance an object can move] < [distance between MBRs] then an object cannot move from a node to next node 1 distance between MBRs 2 ０　 1 2 3 4 5 6 7 8 (=T)

Query Processing Example x tree level =2 root a b c cell c 2 t 1 2 backtrack pruning cell c 2 cell c 1 pruning tree level =0 root cell c 1 pruning tree level =1 root Targets: c 1 c 2 cell c 2 cell c 1 An. There objectisthat no moved as objects that c 1 as c 2 moved is cfound and 1 c 2 counted c 1 c 2

Outline l l Background and objectives Markov transition probability Indexing method for moving trajectory data Proposed methods l l Naïve algorithm CSP-based algorithm Experimental results Conclusions

Dataset (1) l l Generated using the moving object simulator made by Brinkoff [1] Simulates car movement situation on actual city road network l l l Oldenburg city, Germany (about 2. 5 km x 2. 8 km) no. of initial moving objects: 5 5 objects are created in a minute on average 100 objects are moving in the map at a time data is generated for T = 1000 minutes 120 K points are stored in 3 -D R-tree

Experimental Result (1) l l l Map is decomposed into 30 x 30 cells First-order Markov transition probabilities Randomly 3 x 3 cells are selected

Experimental Result (2) l l Estimation of second-order transition probabilities Other parameters are same to the former case

Experimental Result (3) l l Estimation of third-order transition probabilities Other parameters are similar to the former case

Experimental Result (4) l The case when CSP-based approach is not effective l Target space is decomposed into 20 x 20 cells l Estimation of second-order transition probabilities Since cell decomposition is coarse, the pruning cannot reduce candidates

Conclusions and Future Work l Conclusions l l mobility statistics based on Markov transition probability proposals of two algorithms l l naïve approach CSP-based approach effectively utilizes R-tree structure Future Work l l adaptive cell decompositions extension to non-stationary Markov transitions