ICDE 2008 Trajectory Outlier Detection A PartitionandDetect Framework

  • Slides: 36
Download presentation
ICDE 2008 Trajectory Outlier Detection: A Partition-and-Detect Framework April 8, 2007 Jae-Gil Lee, Jiawei

ICDE 2008 Trajectory Outlier Detection: A Partition-and-Detect Framework April 8, 2007 Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework

Table of Contents = Motivation = Partition-and-Detect Framework = Outlier Detection Algorithm: TRAOD •

Table of Contents = Motivation = Partition-and-Detect Framework = Outlier Detection Algorithm: TRAOD • Partitioning Phase (Simple) • Detection Phase • Partitioning Phase (Enhanced) = Performance Evaluation = Related Work = Conclusions 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 2

Outlier Detection = Definition: the process of detecting a data object that is grossly

Outlier Detection = Definition: the process of detecting a data object that is grossly different from or inconsistent with the remaining set of data = Applications: the detection of credit card fraud, the monitoring of criminal activities in electronic commerce, etc. = Algorithms: distribution-based, distance-based, density-based, and deviation-based = Target data: previous research has mainly dealt with outlier detection of point data 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 3

Analysis on Trajectory Data = Tremendous amounts of trajectory data of moving objects are

Analysis on Trajectory Data = Tremendous amounts of trajectory data of moving objects are being collected • Example: vehicle positioning data, hurricane tracking data, animal movement data, etc. = Trajectory outlier detection has many important, real-world applications • Detection of suspicious persons in video surveillance • Analysis of unusual air-mass trajectories in meteorology • … A powerful outlier detection algorithm for trajectories is needed urgently 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 4

Limitations of Existing Algorithms = Knorr et al. [5] have presented one of very

Limitations of Existing Algorithms = Knorr et al. [5] have presented one of very few attempts • Define the distance between two whole trajectories using the summary information (e. g. , the coordinates of the starting and ending points) • Apply a distance-based approach to detection of trajectory outliers = Existing algorithms might not be able to detect outlying portions of trajectories • Example: TR 3 is not detected as an outlier since its overall behavior is similar to those of neighboring trajectories TR 5 TR TR 3 4 TR TR 1 2 04/08/08 An outlying sub-trajectory Trajectory Outlier Detection: A Partition-and-Detect Framework 5

Discovery of Outlying Sub-Trajectories = Discovery of outlying sub-trajectories is very useful in the

Discovery of Outlying Sub-Trajectories = Discovery of outlying sub-trajectories is very useful in the real world • Example: Sudden changes in hurricane’s path [10] We propose the partition-and-detect framework 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 6

The Partition-and-Detect Framework = Consists of two phases: partitioning and detection TR 5 TR

The Partition-and-Detect Framework = Consists of two phases: partitioning and detection TR 5 TR 4 TR 3 TR TR 1 2 (1) Partition A set of trajectories A set of trajectory partitions (2) Detect TR 3 An outlier Outlying trajectory partitions Note: A set of outlying trajectory partitions indicates an outlying subtrajectory 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 7

The Problem Statement = Given a set of trajectories I = {TR 1, …,

The Problem Statement = Given a set of trajectories I = {TR 1, …, TRn}, our algorithm generates a set of outliers O = {O 1, …, Om} with outlying trajectory partitions for each Oi = Necessary definitions: • A trajectory is a sequence of multi-dimensional points, which is denoted as TRi = p 1 p 2 p 3 … pj … pleni; a trajectory partition (t-partition for short) is a line segment pipj (i < j), where pi and pj are the points chosen from the same trajectory • A t-partition is outlying if it does not have a sufficient number of similar neighbors • A trajectory is an outlier if it contains a non-negligible amount of outlying t-partitions 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 8

The Outlier Detection Algorithm: TRAOD = Based on the partition-and-detect framework Algorithm TRAOD (TRAjectory

The Outlier Detection Algorithm: TRAOD = Based on the partition-and-detect framework Algorithm TRAOD (TRAjectory Outlier Detection) Input: A set of trajectories I = {TR 1, …, TRn} Output: A set of outliers O = {O 1, …, Om} with outlying t-partitions for each Oi Algorithm: /* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments; 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier; 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 9

Where We Are Now /* Partitioning Phase */ 01: for each TR I do

Where We Are Now /* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments by a simple strategy; by a two-level partitioning strategy; 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier; 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 10

A Simple Partitioning Strategy (1/2) = Careless partitioning (especially, in a long length) could

A Simple Partitioning Strategy (1/2) = Careless partitioning (especially, in a long length) could miss possible outliers • Example: Even though TRout behaves differently from its neighboring trajectories, these differences are averaged out due to careless partitioning A trajectory TRout A t-partition Neighboring Trajectories 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 11

A Simple Partitioning Strategy (2/2) = A trajectory is partitioned at a base unit:

A Simple Partitioning Strategy (2/2) = A trajectory is partitioned at a base unit: the smallest meaningful unit of a trajectory in a given application • Example: The base unit can be every single point A trajectory TRout An outlying t-partition Neighboring Trajectories A t-partition Pros: high detection quality in general Cons: poor performance due to a large number of t-partitions remedied by a two-level partitioning strategy 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 12

Where We Are Now /* Partitioning Phase */ 01: for each TR I do

Where We Are Now /* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments by a simple strategy; by a two-level partitioning strategy; 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier; 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 13

Distance between T-Partitions = The weighted sum of three components: the perpendicular distance( ),

Distance between T-Partitions = The weighted sum of three components: the perpendicular distance( ), parallel distance( ), and angle distance( ) • Adapted from similarity measures used in the domain of pattern recognition [13] 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 14

Trajectory Outliers Based on Distance (1/2) = Def. (a close trajectory): TRj is close

Trajectory Outliers Based on Distance (1/2) = Def. (a close trajectory): TRj is close to Li TRj is not close to Li = Def. (an outlying t-partition): Not close Close ≤ 1‒p Li is an outlying t-partition 04/08/08 > 1‒p Li is not an outlying t-partition Trajectory Outlier Detection: A Partition-and-Detect Framework 15

Trajectory Outliers Based on Distance (2/2) = Def. (an outlier): • A trajectory TRi

Trajectory Outliers Based on Distance (2/2) = Def. (an outlier): • A trajectory TRi is an outlier if the sum of the lengths of outlying t-partitions in TRi the sum of the lengths of all t-partitions in TRi TRj 04/08/08 ≥F TRi is an outlier TRj is not an outlier Trajectory Outlier Detection: A Partition-and-Detect Framework 16

Incorporation of Density (1/2) = The previous definition, as it is, has the local

Incorporation of Density (1/2) = The previous definition, as it is, has the local density problem • A t-partition in a dense region tends to have relatively a larger number of close trajectories than that in a sparse region T-Partitions in dense regions are favored! 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 17

Incorporation of Density (2/2) = Def. (the density of a t-partition): • The density

Incorporation of Density (2/2) = Def. (the density of a t-partition): • The density of a t-partition Li is the number of t-partitions within the distance σ from Li, where σ is the standard deviation of pairwise distances between t-partitions = Def. (the adjusting coefficient of a t-partition): adj(Li) = the average density of all t-partitions the density of the t-partition Li = Adjustment by the density • The number of close trajectories is multiplied by the adjusting coefficient adj(Li) < 1. 0 in a dense region adj(Li) > 1. 0 in a sparse region 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 18

Guidelines for Parameter Values = Three parameters: • D corresponds to similar, p to

Guidelines for Parameter Values = Three parameters: • D corresponds to similar, p to sufficient, and F to non-negligible = Remark: There is no universally correct parameter value even for the same data set and application = Our guideline: Resorts on user feedback 04/08/08 D Smaller p 0. 90 Have Many Trajectories? 0. 99 F 0. 10 Are Trajectories Short? 0. 20 Want Many Outliers? Trajectory Outlier Detection: A Partition-and-Detect Framework Larger 19

Where We Are Now /* Partitioning Phase */ 01: for each TR I do

Where We Are Now /* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments by a simple strategy; by a two-level partitioning strategy; 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier; 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 20

Two-Level Trajectory Partitioning = Objective • Achieves much higher performance than the simple strategy

Two-Level Trajectory Partitioning = Objective • Achieves much higher performance than the simple strategy • Obtains the same result as that of the simple strategy; i. e. , does not lose the quality of the result = Basic idea 1. Partition a trajectory in coarse granularity first 2. Partition a coarse t-partition in fine granularity only when necessary = Main benefit • Narrows the search space that needs to be inspected in fine granularity Many portions of trajectories can be pruned early on 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 21

Intuition to Two-Level Trajectory Partitioning = If the distance between coarse t-partitions is very

Intuition to Two-Level Trajectory Partitioning = If the distance between coarse t-partitions is very large (or small), the distances between their fine t-partitions is also very large (or small) TRi Coarse-Granularity Partitioning Fine-Granularity Partitioning TRj Given two coarse t-partitions, can we know if the distance between any two fine t-partitions is greater than (or less than) D? 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 22

Coarse-Granularity Partitioning* = Try to maximize two rivalry measures • Preciseness: the difference between

Coarse-Granularity Partitioning* = Try to maximize two rivalry measures • Preciseness: the difference between a trajectory and a set of its coarse tpartitions should be as small as possible − Required for making the bounds tight • Conciseness: the number of coarse t-partitions should be as small as possible − Required for reducing the number of comparisons = Formulate this problem using the minimum length description (MDL) principle • A good tradeoff between the two measures is found based on the information theory * Coarse-granularity partitioning is identical to that in our earlier work on trajectory clustering [15] 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 23

Fine-Granularity Partitioning = Identify outlying coarse t-partitions by deriving the distance bounds between two

Fine-Granularity Partitioning = Identify outlying coarse t-partitions by deriving the distance bounds between two coarse t-partitions Li and Lj • Suppose li is a fine t-partition in Li and lj is that in Lj lb(Li, Lj, f) The lower bound of f(li, lj), ub(Li, Lj, f) The upper bound of f(li, lj), • Derive the above bounds separately for combine them (Lemma 4) Li TRi (Lemmas 1~3) and Lj TRj 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 24

Derivation of the Distance Bounds Lemma 1. Bounds for Lemma 2. Bounds for Lemma

Derivation of the Distance Bounds Lemma 1. Bounds for Lemma 2. Bounds for Lemma 3. Bounds for Combine Lemma 4. Bounds for dist(Li, Lj) 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 25

Pruning Rules for Fine-Granularity Partitioning = Rule 1: If lb(Li, Lj, dist) > D,

Pruning Rules for Fine-Granularity Partitioning = Rule 1: If lb(Li, Lj, dist) > D, fine-granularity partitioning is not required when comparing Li and Lj Li Lj lb(Li, L> D >D j, dist) = Rule 2: If ub(Li, Lj, dist) ≤ D, fine-granularity partitioning is required, but the distance between the fine t-partitions in Li and Lj needs not be computed Li Lj 04/08/08 ub(Li, ≤Lj. D , dist) ≤ D Trajectory Outlier Detection: A Partition-and-Detect Framework 26

Performance Evaluation = Use two real trajectory data sets • Hurricane track data set

Performance Evaluation = Use two real trajectory data sets • Hurricane track data set − Records the Atlantic hurricanes for the years 1950 through 2006 − The entire set: 608 trajectories and 18, 951 points; A small set (1990~2006): 221 trajectories and 7, 270 points • Animal movement data set − Records the locations of elk, deer, and cattle for the years 1993 through 1996 (the Starkey Project) − Elk 1993: 33 trajectories and 15, 422 points; Deer 1995: 32 trajectories and 20, 065 points; Cattle 1993: 41 trajectories and 19, 556 points = Validate the quality of outlier detection = Evaluate the effectiveness of the two-level partitioning strategy 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 27

Trajectory Outliers for Hurricane Data (Small) D = 85, p = 0. 95, F

Trajectory Outliers for Hurricane Data (Small) D = 85, p = 0. 95, F = 0. 2 → # of outliers = 13 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 28

Trajectory Outliers for Elk 1993 D = 55, p = 0. 95, F =

Trajectory Outliers for Elk 1993 D = 55, p = 0. 95, F = 0. 1 → # of outliers = 3 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 29

Trajectory Outliers for Deer 1995 D = 80, p = 0. 95, F =

Trajectory Outliers for Deer 1995 D = 80, p = 0. 95, F = 0. 1 → # of outliers = 3 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 30

Effects of Parameter Values (a) D = 83, p = 0. 95, F =

Effects of Parameter Values (a) D = 83, p = 0. 95, F = 0. 2 19 outliers 10 outliers (b) D = 87, p = 0. 95, F = 0. 2 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 31

Pruning Power of Two-Level Partitioning 2 L-Total: the ratio of the number of pairs

Pruning Power of Two-Level Partitioning 2 L-Total: the ratio of the number of pairs pruned by Rule 1 to the total number of pairs of coarse t-partitions 2 L-False: the proportion of pairs pruned incorrectly Optimal: the maximum ratio of pairs that can be pruned Achieves high pruning power (64~88%) 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 32

Speedup Ratio of Two-Level Partitioning the elapsed time of the algorithm using the simple

Speedup Ratio of Two-Level Partitioning the elapsed time of the algorithm using the simple partitioning strategy Speedup Ratio = the elapsed time of the algorithm using the two-level partitioning strategy Shows significant performance improvement 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 33

Related Work = Outlier detection algorithms for points • Distribution-based [2], distance-based [3, 4,

Related Work = Outlier detection algorithms for points • Distribution-based [2], distance-based [3, 4, 5, 6], density-based [7, 8], deviation-based [9] = Trajectory outlier detection technique using a distance-based approach [5] • Not clear whether this technique can detect outlying sub-trajectories from very complicated trajectories = Trajectory outlier detection algorithms based on classification [12] • Require a good training set and depend on training 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 34

Conclusions = Proposed a novel framework, the partition-and-detect framework, for detecting trajectory outliers =

Conclusions = Proposed a novel framework, the partition-and-detect framework, for detecting trajectory outliers = For the 1 st phase, proposed a two-level trajectory partitioning strategy • Ensures both high quality and high efficiency = For the 2 nd phase, proposed a hybrid of the distance-based and density-based approaches • Very intuitive, but does not have the local density problem = Demonstrated the effectiveness of TRAOD using various real trajectory data 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 35

Thank You! 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 36

Thank You! 04/08/08 Trajectory Outlier Detection: A Partition-and-Detect Framework 36