TOPTRAC Topical Trajectory Pattern Mining Source KDD 2015

Outline Introduction Method Experient conclusion

Introduction Goal Topical trajectory mining problem: Given a collection of geo-tagged message trajectories, it’s

Introduction Transition pattern Transition snippet

Introduction Definition Trajectory(st) geo-tagged message (mt, i) Geo-tag Gt, i : 2 -dim vector(Gt,

Introduction Definition Latent semantic region: a geographical location where messages are posted with the

Outline Introduction Method Experience conclusion

Method Generative Model Assume there are M latent semantic regions K hidden topics in

Method Generative process Ex: Θ 1=(topic 1, …topic k ) 0. 3 0. 2

Method mt, 1 λt : Bernoulli distribution(0~1) St, i = {0, 1}: Whether mt,

Method Case 1: St, i = 1 Select Rt, i = Uniform(1/M) Generate Gt,

Method Case 3: else Select Rt, I = Categorical(δr(t, i-1), z(t, i-1)) Generate Gt,

Method Select Topic Zt, i = Categorical(θRt, i) Select a message wt, I =

Method Variational EM Algorithm Maximum likelihood estimation θR, Φk, λt St, i, Rt, i,

Method Finding the Most Likely Sequence Notations: : maximum probability to generate the subsequence

Method Compute : case 1: St, i-1 = 0 ; case 2 : St,

Method Finding Frequent Transition Patterns st’ = {(st, 1, rt, 1, zt, 1), …,

Method Example s 1’={(0, 1, 1)(1, 1, 2)(1, 2, 1)}, s 2’={(1, 1, 2)(0,

Experience Data sets NYC 9070 trajectories, 266808 geo-tagged messages M = 30, K =

Experience Baseline LGTA Run the inference algorithm and find frequent trajectory patterns similar in

Conclusion Propose a trajectory pattern mining algorithm, called TOPTRAC, using probabilistic model to capture

Slides: 29

Download presentation

TOPTRAC: Topical Trajectory Pattern Mining Source: KDD 2015 Advisor: Jia-Ling Koh Speaker: Hsiu-Yi, Chu Date: 2018/1/22

Outline Introduction Method Experient conclusion

Introduction

Introduction Goal Topical trajectory mining problem: Given a collection of geo-tagged message trajectories, it’s to find topical transition pattern and the top-k transition snippets which best represent each transition pattern

Introduction Transition pattern Transition snippet

Introduction Definition Trajectory(st) geo-tagged message (mt, i) Geo-tag Gt, i : 2 -dim vector(Gt, i, x, Gt, i, y) Bag-of-word wt, i : N words{wt, i, 1, …, wt, i, n}

Introduction Definition Latent semantic region: a geographical location where messages are posted with the same topic preference Topical transition pattern: a movement from one semantic region to another frequently

Outline Introduction Method Experience conclusion

Method Generative Model Assume there are M latent semantic regions K hidden topics in the collection of geo-tagged messages How to generate each sequence st = (mt, 1, mt, 2 , … , mt, n )

Method Generative process Ex: Θ 1=(topic 1, …topic k ) 0. 3 0. 2 0. 6 Ex: Φk=(word 1, …, word v) 0 2 3

Method mt, 1 λt : Bernoulli distribution(0~1) St, i = {0, 1}: Whether mt, i is in the local context mt, 2

Method Case 1: St, i = 1 Select Rt, i = Uniform(1/M) Generate Gt, i = Uniform(f 0) Case 2: i =1 and St, i = 1 or i >= 2 and St, i = 1 and St, i-1 = 0 Select Rt, I = Categorical(δ 0) Generate Gt, I = f(Gt, I) mt, 1 mt, 2

Method Case 3: else Select Rt, I = Categorical(δr(t, i-1), z(t, i-1)) Generate Gt, I = f(Gt, I) mt, 1 mt, 3

Method Select Topic Zt, i = Categorical(θRt, i) Select a message wt, I = Multinomial(ΦZt, i)

Method Likelihood

Method Variational EM Algorithm Maximum likelihood estimation θR, Φk, λt St, i, Rt, i, Zt, I μ r, Σ r

Method Finding the Most Likely Sequence Notations: : maximum probability to generate the subsequence when St, i=0 : : maximum probability to generate the subsequence when St, i=1

Method Compute : case 1: St, i-1 = 0 ; case 2 : St, i-1 = 1

Method Finding Frequent Transition Patterns st’ = {(st, 1, rt, 1, zt, 1), …, (st, n, rt, n, zt, n)} Transition Patterns = {( r 1, z 1)(r 2, z 2)} Start with (1, r 1, z 1) and ends with (1, r 2, z 2) τ : minimum support

Method Example s 1’={(0, 1, 1)(1, 1, 2)(1, 2, 1)}, s 2’={(1, 1, 2)(0, 2, 1)(1, 2, 1)} τ = 2 → {(1, 2)(2, 1)} is a transition pattern Top-k transition snippets k largest probabilities of with

Outline Introduction Method Experience conclusion

Experience Data sets NYC 9070 trajectories, 266808 geo-tagged messages M = 30, K = 30, τ = 100 SANF 809 trajectories, 19664 geo-tagged messages M = 20, K = 20, τ = 10

Experience Baseline LGTA Run the inference algorithm and find frequent trajectory patterns similar in page 15, 16 NAÏVE First groups messages using EM clustering Cluster the messages in each group with LDA

Experience

Outline Introduction Method Experience conclusion

Conclusion Propose a trajectory pattern mining algorithm, called TOPTRAC, using probabilistic model to capture the spatial and topical patterns of users. Developed an efficient inference algorithm for our model and also devised algorithms to find frequent transition patterns as well as the best representative snippets of each pattern.