CENTRE Cellular Networks Positioning Data Generator Fosca Giannotti

Why to generate data? n Trouble in finding ¡ ¡ n Due to ITC

CENTRE: n CEllular Network Trajectory Reconstruction Environment: n A positioning data (LOG) generation Environment

Geo. PKDD: Geographic Privacy-Aware Knowledge Discovery & Delivery

The Idea n To generate positional mobile data (LOG) by the simulation of the

Motivation n With this model we want to reach: ¡ ¡ ¡ More rigorous

What CENTRE do… n n Then weweoverlap set of antennas First of all generate

LOG extraction n So LOG is represented by a tuple: ( Obj_ID, BTS_ID, Time.

Trajectories reconstruction n Once LOG are produced and stored, we forget about synthetic trajectories

Information types n Reconstruction was performed considering all LOGs produced on a single temporal

Recontruction method n When we have: ¡ ¡ Only one relevation: our point may

Now we work on… n Make new extensions to main generation engine ¡ n

Multiple generation engines n n The Idea is to develop extensions to main engine

Density based clustering n We have seen that for best results with this algorithm

Attraction engine n n n For this particular type of algorithm we are developing

Cluster construction n A cluster if formed by a set of objects that are

…a simple example n n n In this scenario we can see one object

Others improvements n Formalization of some concepts (at code level): ¡ ¡ ¡ n

Conclusions n n n Nowadays work is in progress, and we hope to test

Slides: 26

Download presentation

CENTRE Cellular Network’s Positioning Data Generator Fosca Giannotti Andrea Mazzoni Puntoni Simone Chiara Renso KDD-Lab KKD-Lab KDD-Lab

Why to generate data? n Trouble in finding ¡ ¡ n Due to ITC Companies reticence …and for legal and privacy reasons Need to have ad-hoc datasets ¡ ¡ To improve algorithm development To have a tools for validation and testing phases

CENTRE: n CEllular Network Trajectory Reconstruction Environment: n A positioning data (LOG) generation Environment aimed to Mobile technology n Developed as tool of Geo. PKDD projects

GSM technology

Geo. PKDD: Geographic Privacy-Aware Knowledge Discovery & Delivery

The Idea n To generate positional mobile data (LOG) by the simulation of the event deriving from: ¡ ¡ n Trajectories of hypothetical mobile network’s users that travel on territory The resulting survey of this movements using synthetic ad-hoc GSM coverage (the set of BTSs) So we can analyze the set of LOGs and recontruct trajectories of mobile network’s users

Motivation n With this model we want to reach: ¡ ¡ ¡ More rigorous and realistic semantic of generating data. Possibility to compare synthetic trajectories with reconstructed one. Chance of validate mining and knowledge discovery algorithms results with synthetic trajectories.

CENTRE architecture

What CENTRE do… n n Then weweoverlap set of antennas First of all generate aasequence of spatio-temporal points represent a trajectory. We can customize: represented by circles of their coverage ¡ Starting point areas: ¡ Velocity ¡ ¡ Agility Direction Groups of behavior Infrastructures, ect.

LOG extraction n So LOG is represented by a tuple: ( Obj_ID, BTS_ID, Time. Stamp, d) n n Where: of extraction: Result Obj_ID is the identifier of ¡ LOG at time observed objecttt 2 (P 2) n {Cell 1, tt 2, d 12} 2. BTS_ID is the. BTS 1, identifier of antenna that made this survey ¡ LOG at time tt 3 (P 3) 3. Time. Stamp is the time of n {Cell 1, BTS 1, tt 3, d 13}, survey {Cell 1, BTS 2, d 23}, 4. D isn a evaluation of tt 3, distance from the center of BTS n object {Cell 1, to. BTS 3, tt 3, d 33} 1. ¡ LOG at time tt 4 (P 4) n {Cell 1, BTS 2, tt 4, d 24}

Dataset

Trajectories reconstruction n Once LOG are produced and stored, we forget about synthetic trajectories and try to reconstruct these only from: ¡ ¡ LOG collection Synthetic coverage

Information types n Reconstruction was performed considering all LOGs produced on a single temporal instant for a single trajectoty n The number of LOGs with same time and same device identificator 3 LOGs (id_cell) represent the number of simultaneous relevations 1 LOG 2 LOGs

Recontruction method n When we have: ¡ ¡ Only one relevation: our point may be inside the entire antenna covered area, so we take antenna center as point positions With two or more relevations: point may be only inside the intersection area of them, so we take centroid of this area as point position

Reconstructed trajectories dataset

And now …examples!

Now we work on… n Make new extensions to main generation engine ¡ n In order to test and validate spatial KD algorithms with more efficiency and accuracy. Change old code (that was derived from GSTD code) ¡ ¡ Introducing improvements on class structures Introducing new data characterization specially on spatial and temporal aspects

Multiple generation engines n n The Idea is to develop extensions to main engine every time we need new features to test and validate KD algorithms. And use each time the best implementation on sinthetic trajectories production engine depending of type of data we need to obtain

Density based clustering n We have seen that for best results with this algorithm is useful to have a simple method for: ¡ ¡ create clusters and identify relation between objects and clusters.

Attraction engine n n n For this particular type of algorithm we are developing a new engine extension that use an attraction-like mechanism. Each objects chooses and tries to reach its next attraction area. When it reaches its destination area chooses another one, and so on…

Cluster construction n A cluster if formed by a set of objects that are forced to pass through a sequence of areas.

…a simple example n n n In this scenario we can see one object that every time chooses a region with a completely random order. Chosen a region, and a point on it, the object tries to reach this point. …and so on

Others improvements n Formalization of some concepts (at code level): ¡ ¡ ¡ n Spatio-temporal data Spatio-temporal object Trajectory and a real measures in data values: ¡ ¡ ¡ Positions are expressed in meters Velocities are expressed in meters/seconds Times are expressed in seconds

Conclusions n n n Nowadays work is in progress, and we hope to test as soon as possible a Density Based Algorithm on this new generation engine Contextually we also work on a engine for testing Temporal and Sequential Frequent Pattern Algorithm And also to improve generator use, through simplification of number and form of parameters, graphical interface, ect.