Universidade Federal de Santa Catarina Florianopolis Brazil Informatics
Universidade Federal de Santa Catarina, Florianopolis, Brazil Informatics and Statistics Department A conceptual Data Model for Trajectory Data Mining * Prof. Vania Bogorny (INE/UFSC - Brazil) vania@inf. ufsc. br Prof. Carlos Alberto Heuser (II/UFRGS - Brazil) Prof. Luis Otavio Alvares (II/UFRGS-Brazil) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 1
Outline • • • 11/6/2020 Motivation Objective Basic concepts Proposed Model Evaluation Conclusion GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 2
Introduction and Motivation • On the one side (database technology. . . . ) – Since its origin, database design has the purpose of modeling data for operational purposes only – Database designers don't think about data mining during the conceptual database design 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 3
Introduction and Motivation • On the other side (artificial intelligence. . . . ) – Data mining (DM) or knowledge discovery (KDD) from databases has become very popular in the last years in many fields and several application domains – Dozens of new data mining algorithms have been proposed in the last decade, • but very little has been done for the automatic data preprocessing, which is the most time consuming step 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 4
Introduction and Motivation DATABASE Modelling (Normalization) DATA MINING (Disnormalization) One single file 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 5
Introduction and Motivation • Another problem for data mining: – data have to be preprocessed and transformed into different granularities – Examples: • Louvre Museum Turistic. Place Instance + type 11/6/2020 type GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 6
Introduction and Motivation • These problems increase when dealing with trajectories of moving objects, which is the focus of this paper 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 7
Objective We propose a conceptual framework for trajectory database modeling that supports data mining 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 8
Basic Concepts 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 9
Trajectory Data • Trajectories are new kind of spatiotemporal data • Trajectories have attracted intensive research in both databases and data mining communities 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 10
Trajectory Raw Data • Trajectory Data are: – Spatio-temporal data – Represented by a set of points located in space and time – Form: (tid, x, y, t), where tid is the trajectory identifier, (x, y) represent the spatial location at time t Tid position (x, y) 1 48. 890018 2. 246100. . . 1 48. 890020 2. 246102 1 48. 888880 2. 248208 1 48. 885732 2. 255031. . . 1 48. 858434 2. 336105 1 48. 853611 2. 349190. . . 2. . . 11/6/2020 time (t) 08: 25 08: 26. . . 08: 40 08: 41 08: 42. . . 09: 04 09: 05. . . GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 11
The Model of Stops and Moves (Spaccapietra 2008) STOPS – Important parts of trajectories – Where the moving object has stayed for a minimal amount of time – Stops are application dependent • Tourism application – Hotels, touristic places, airport, … • Traffic Management Application – Traffic lights, roundabouts, big events… MOVES – Are the parts that are not stops 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 12
Semantic Trajectories • A semantic trajectory is a set of stops and moves – Stops have by a place, a start time and an end time – Moves are characterized by two consecutive stops 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 13
STOPS at Multiple-Granularities Stop at Ibis Hotel from 6: 04 PM to 7: 42 PM, september 16, 2010 space time Ibis. Hotel or Accommodation Afternoon or Thursday or 6: 00 PM – 8: 00 PM or RUSH-HOUR 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 14
ITEMS - the building blocks for semantic pattern discovery • An item is generated either from a stop or a move • An item is a set of complex information (space + time), that can be defined in many formats/types and at different granularities 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 15
Building an ITEM for Data Mining • Formats/types for an item: • Name. Only: is the name of the stop/move – STOPS: name of the spatial feature instance • Ibis. Hotel – MOVES: name of the two stops which define the move • Zurich. Airport – Ibis. Hotel • Name. Start: is the name of the stop/move + start time – Ibis. Hotel [morning] --stop – Louvre. Museum [weekend] --stop – Ibis. Hotel-Zurich. Airport [10: 00 AM-11: 00 AM] --move 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 16
Building an ITEM for Data Mining • Name. End: name of a stop/move + end time – Ibis. Hotel[morning] stop – Ibis. Hotel-Zurich. Airport[10: 00 AM-11: 00 AM] move • Name. Start. End: name of a stop/move + start time + end time – Ibis. Hotel[08: 00 AM-11: 00 AM][1: 00 pm-6: 00 pm] stop – Louvre. Museum[morning][afternoon] stop – Zurich. Airport– Ibis. Hotel [10: 00 AM-11: 00 PM] [10: 00 AM-6: 00 PM] 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 17
Semantic Trajectory Patterns Frequent Patterns Sequential Patterns and Association Rules 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 18
Trajectory Frequent Pattern • Is a set of items that occur a minimal number of times (support s) • Examples: {Louvre. Museum [08: 00 -10: 00]} (s=0. 1) {Airport [morning], hotel [morning]} (s=0. 2) {Airport-Hotel, Hotel-Museum} (s=0. 15) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 19
Trajectory Sequential Pattern • Is an ordered list of items that occur a minimal number of times (support s) • Examples: <Airport[morning], Hotel[morning], Museum[afternoon] > <Airport-Hotel, Hotel-Museum> (s= 0. 1) 11/6/2020 (s=0. 15) GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 20
Trajectory Association Rule • Is a rule where the items occur a minimal number of times (support s) and with a minimal confidence (c) • Example – Airport[morning], Hotel[morning] Museum[afternoon] (s=0. 1) (c=0. 5) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 21
The Proposed Model 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 22
The Proposed Model • We extend the model of stops and Moves proposed by Spaccapietra with new attributes and methods • Add new classes and relationships, with attributes and methods to automatic data preprocessing and multiple-level mining 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 23
The Conceptual Data Model of Stops and Moves (Spaccapietra 2008) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 24
Proposed OO Model Compute and Store the patterns Data Pre-processing Spaccapietra´s Model 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 25
Proposed OO Model Stops and Moves are extended with new attributes (specific time, e. g. 07: 10 – 08: 05 ) and methods to instatiate stops and moves Concept Hierarchy for the spatial feature type (e. g. : Accomodation. Place Hotel) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 26
Proposed Model OO Model Generic class to represent the 3 kinds Attributes: support, list. Of. Items of patterns Methods: count. Support(), sequential. Pattern() Attributes: support, antecedent (set Attributes : start. T, end. T (generic confidence, time, e. g. Morning) of items) and consequent Methods: Frequent (set of Patterns: items) get. Generic. Spatial. Feature() – retrieves the hierarchy level Attributes: support, set. Of. Items time. G() – generalizes time Methods: count. Support(), space. G() – generalizes Methods: space based on the hierarchy associate. Pattern(), and frequent. Pattern() build. Item() – creates generalized ITEM compute. Confidence() 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 27
Example of an Instantiated Model 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 2828
Schema of Stops and Moves • STOP (Tid integer, SFTname string, SFid integer, start. T timestamp, end. T timestamp) Ex. : stop (1, 1, Hotel, 3, 10 AM, 11 AM) • MOVE (Tid integer, Mid integer, SFT 1 name string, SF 1 id integer, SFT 2 name string, SF 2 id integer, start. T timestamp, end. T timestamp, the_move geometry) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 29
Schema of the Patterns Nested relation Frequent. Pattern/ Sequential. Pattern (Pid integer, pattern item. Set. Type, support real) item. Set. Type (SFT 1 name string, SF 1 id integer, SFT 2 name string, SF 2 id integer, start. T string, end. T string) Associate. Pattern (Pid integer, antecedent item. Set. Type, consequent item. Set. Type, support real, confidence real) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 30
Instantiating and Querying Patterns To instantiate the patterns we can use the ST-DMQL proposed in (Bogorny 2009) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 31
Instantiating Stops and Moves SELECT generate. S (method, candidate. Stops, buffer) FROM trajectory IB-SMOT CB-SMOT DB-SMOT. . . SELECT generate. M (method, candidate. Stops, buffer) FROM trajectory 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 32
Instatiating Sequential Patterns Q 1 (tourism application): Which are the sequences of moves that occur most frequently in the morning and in the evening? Method in the ST-DMQL SELECT sequential. Pattern (item. Type = Name. End, time. G = [8: 00 -12: 00 AS morning, 18: 00 -23: 00 AS evening], space. G = instance, minsup=0. 03) FROM move Ans: {Ibis. Hotel - Notre. Dame[morning], Eiffel. Tower – Ibis. Hotel [evening]} (s=0. 04) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 33
Example of Pattern Queries Q: How many moves of sequential patterns cross Pont Neuf bridge? SELECT count(m. *) FROM sequential. Pattern s, bridge b, move m WHERE s. pattern. SFT 1 name=m. SFT 1 name AND s. pattern. SF 1 id=m. SF 1 id AND s. pattern. SFT 2 name=m. SFT 2 name AND s. pattern. SF 2 id=m. SF 2 id AND b. name='Pont Neuf' AND intersects (m. the_geom, b. the_geom) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 34
Conclusions • Data pre-processing is the most time consuming step for DM and KDD • To think about data mining during the conceptual design of a database can significantly reduce these steps 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 35
Conclusions • The proposed model: – – – * Reduces the pre-processing tasks Supports mining at multiple granularity levels Automatically prepares the data for data mining Stores the patterns for futures queries Multiple-granularities data patterns Queries 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 36
Geometric Patterns X Semantic Patterns (Bogorny 2009) Geometric Pattern TP R R CC T 2 CC T 3 T 2 T 1 T 4 TP T 3 H H T 1 H Hotel T 4 R Restaurant TP Touristic Place Semantic trajectory Pattern (a) Hotel to Restaurant, passing by CC (b) go to Cinema, passing by CC 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 37
Thank You! 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 38
More examples for generating stops SELECT generate. S (CB-SMOT, [Hotel, 60, Touristic. Place, 15, Shopping. Center, 30], 5) FROM trajectory t, district d WHERE d. name='Bela Vista' and intersects (t. movingpoint. geometry, d. geometry) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 39
Querying Rules Suppose that the user is interested in association patterns which have weekend as the time dimension in the antecedent of the rule SELECT * FROM associate. Pattern WHERE antecedent. start. T='weekend' or antecedent. end. T='weekend' 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania 40
Basic Concepts: Support 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania
Basic Concepts: Semantic Trajectory Patterns Example Work [morning], Shopping. Center [afternoon], Gym [afternoon] (s=0. 08%) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania
Basic Concepts: Semantic Trajectory Patterns Example Home [night], Work [afternoon] Gym [afternoon] (s=0. 10%) (c=0. 50) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania
Basic Concepts: Semantic Trajectory Patterns Example Religious. Place [weekend], Restaurant [weekend] (s=0. 07) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania
Related Works Mining Trajectory Samples – Extract Geometric Patterns Mining Semantic Trajectories or Trajectory preprocessing for mining Attempts to reduce the gap between databases and data mining • Laube 2002, 2005 • Giannotti 2007 • Lee 2007 • Cao 2006, 2007 • Li 2010 • Alvares 2007 • Zhou 2007 • Palma 2008 • Bogorny 2009 • Manso 2010 • Data mining query languages, but not for trajectories (Wang 2003, Malerba 2004, Han 1995) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania
Example of Frequent Pattern Instantiation Q 2: Which are the types of places most frequently visited by tourists on weekdays and weekends? Method in the STDMQL SELECT frequent. Pattern (item. Type =Name. Start, time. G = WEEKEND-WEEKDAY, space. G = [type, Generic. Hotel = 1], minsup = 0. 15) FROM stop Ans: {4 Stars. Hotel[weekend], Museum[weekend], Restaurant[weekend] } (s=0. 16) 11/6/2020 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www. inf. ufsc. br/~vania
- Slides: 46