VCI Velocity Constrained Indexing and QI Query Indexing

  • Slides: 52
Download presentation
(VCI) Velocity Constrained Indexing and (QI) Query Indexing Sunil Prabhakar Purdue University (Y. Xia,

(VCI) Velocity Constrained Indexing and (QI) Query Indexing Sunil Prabhakar Purdue University (Y. Xia, D. Kalashnikov, W. Aref, S. Hambrusch) Sunil Prabhakar

Moving Object Databases n n Pervasive Location-Aware Computing Environments (PLACE) http: //www. cs. purdue.

Moving Object Databases n n Pervasive Location-Aware Computing Environments (PLACE) http: //www. cs. purdue. edu/place. html Proliferation of mobile, wireless, and GPS technologies. What if objects can determine their location and send it to a server along with other data and queries? Locator Services, fleet mgmt, groups, personalized navigation, personalized (localized) information, targeted advertising, tracking children, traffic routing, services for the blind, OZ… 12/29/2021 Sunil Prabhakar 2

Satellite Repository Server Data Broadcast Repository Server … Satellite Uplink Regional Server … Regional

Satellite Repository Server Data Broadcast Repository Server … Satellite Uplink Regional Server … Regional Server Mobile Object 12/29/2021 … … Sunil Prabhakar 3

Issues n n Continuous queries over spatio-temporal data. How to scale to millions of

Issues n n Continuous queries over spatio-temporal data. How to scale to millions of objects and queries? Data Imprecision Infrastructure u How to communicate? Similar to Cell Phones? u How to determine location? n Security and Privacy 12/29/2021 Sunil Prabhakar 4

Scalable Execution (Indexing) n Query indexing (IEEE Trans. On Computers) u n Velocity-Constrained indexing

Scalable Execution (Indexing) n Query indexing (IEEE Trans. On Computers) u n Velocity-Constrained indexing u n Exploit limit (pessimistic) on object speed Topographical Tree index (submitted) u u n Index queries instead of data -- 100 fold improvement for continuous queries Index the space (I. e. buildings, highways, etc. ) Reduces need for update to index Main-memory execution (DEXA 02) u u Grid-based solutions for continuous range queries Spatial join 12/29/2021 Sunil Prabhakar 5

Imprecise Data n n Data is inherently imprecise How does this affect queries? u

Imprecise Data n n Data is inherently imprecise How does this affect queries? u n Nearest-neighbor queries (ICDE 03) u u n Limit uncertainty Probabilistic answer Efficient computation -- VCI, quantization. Extend to more general sensor setting (submitted) u u Classification of queries Quantification of query quality Which sensors to update if resources are limited? Based upon impact on quality of current queries. 12/29/2021 Sunil Prabhakar 6

Data Broadcast n Broadcasting spatial data with an index to minimize u tuning time

Data Broadcast n Broadcasting spatial data with an index to minimize u tuning time u latency n Single Channel (SSTD 01) u Optimal n query latency Multiple Channels (in submission) 12/29/2021 Sunil Prabhakar 7

PLACE Prototype n n n Location-based services Currently using my. SQL and SQLServer GPS

PLACE Prototype n n n Location-based services Currently using my. SQL and SQLServer GPS enabled IPAQs, laptops etc. Update location and run queries (continuous) Studying u u u n various execution policies Scalable location updating Call back of mobile devices Move to Predator 12/29/2021 Sunil Prabhakar 8

Goals for this talk n Efficient evaluation of continuous range queries over moving objects.

Goals for this talk n Efficient evaluation of continuous range queries over moving objects. Monitoring queries n Example: “Tracking aircraft” n u Set of region queries u This set rarely changes u Evaluation over period of time (not once) u Zones where an aircraft can be tracked down by enemy radars are specified as continuous region queries u Alert is given when a friendly aircraft is in such a zone 12/29/2021 Sunil Prabhakar 9

Model n n n Objects are points, queries are rectangles #queries < #objects Objects

Model n n n Objects are points, queries are rectangles #queries < #objects Objects report locations periodically or when they move significantly Locations stored in file on Evaluation of queries is periodical with fixed time step 12/29/2021 Sunil Prabhakar 10

Organization of talk n n n Introduction and motivation Related work Query indexing VCI

Organization of talk n n n Introduction and motivation Related work Query indexing VCI indexing Conclusions 29 December 2021 Sunil Prabhakar 11

Related Work (1) Trajectories are mapped to points in a higher dimensional space 1.

Related Work (1) Trajectories are mapped to points in a higher dimensional space 1. 2. 3. Map trajectories to points in a higher-dimensional space. Index the higher-dimensional space. Queries are transformed to counter the data transformation. G. Kollios, D. Gunopulos, and V. J. Tsotras. On indexing mobile objects. PODS 1999 J. Tayeb, O. Ulusoy, and O. Wolfson. A Quadtree based dynamic attribute indexing method. Computer Journal 98 12/29/2021 Sunil Prabhakar 12

Related Work (2) Index the past trajectories of moving objects as line segments n

Related Work (2) Index the past trajectories of moving objects as line segments n n STR-tree ( spatial-temporal R-tree). Tb-tree ( trajectory-bundle tree). D. Pfoser, C. S. Jensen and Y. Theodoridis, Novel approaches to the indexing of moving object trajectories. VLDB 2000 12/29/2021 Sunil Prabhakar 13

Related Work (3) Indexing the current and Anticipated future TPR-tree (time Parameterized r-tree). (Parameterize

Related Work (3) Indexing the current and Anticipated future TPR-tree (time Parameterized r-tree). (Parameterize the index structure using velocity vector. ) S. Saltenis, C. S. Jensen, S. T. Leutenegger, M. A. Lopez, indexing the positions of continuously moving objects. SIGMOD 2000 12/29/2021 Sunil Prabhakar 14

Related Work (4) • • • Precision/ Uncertainty control Spatial-temporal data models / data

Related Work (4) • • • Precision/ Uncertainty control Spatial-temporal data models / data types … None of these addresses the timely execution of multiple concurrent queries on a collection of moving objects. 12/29/2021 Sunil Prabhakar 15

“Traditional” Approaches An index is built on the data to improve query performance. Drawback:

“Traditional” Approaches An index is built on the data to improve query performance. Drawback: constant updates to index • Insert/ delete • Reconstruct • Modify • Brute force 12/29/2021 Sunil Prabhakar 16

Traditional Approaches Parameters Number of I/O Operations. 1, 000 211, 817 5, 865 3,

Traditional Approaches Parameters Number of I/O Operations. 1, 000 211, 817 5, 865 3, 806 Brute Force 1, 010 1, 000 10, 000 228, 308 22, 356 20, 298 5, 100 1, 000 211, 317 13, 413 22, 581 1, 010 10, 000 228, 508 59, 904 39, 072 5, 100 m q 1, 000 10, 000 Reconstruct Ins/Del Modify Brute Force has the lowest I/O cost, but high CPU cost(we will see that later) 12/29/2021 Sunil Prabhakar 17

Organization of talk n n n Introduction and motivation Related work Query indexing VCI

Organization of talk n n n Introduction and motivation Related work Query indexing VCI indexing Conclusions 29 December 2021 Sunil Prabhakar 18

Query Indexing Based upon the observation that continuous queries are stable, but the data

Query Indexing Based upon the observation that continuous queries are stable, but the data is constantly evolving (moving objects), proposed solution: Query Indexing • Building index on queries instead of data • Incremental evaluation for continuous queries • Optimization: Exploit safe regions 12/29/2021 Sunil Prabhakar 19

Safe Regions: possible ranges of movement of an object without affecting its relevance to

Safe Regions: possible ranges of movement of an object without affecting its relevance to any query n n n Reduce objects that need to be processed Reduce communication cost Three types considered: u. Max Dist u. Max Sphere u. Max Rect 12/29/2021 Sunil Prabhakar 20

Examples of Safe Regions Max. Dist Max. Sphere Max. Rect 12/29/2021 Sunil Prabhakar 21

Examples of Safe Regions Max. Dist Max. Sphere Max. Rect 12/29/2021 Sunil Prabhakar 21

Experimental Evaluation n n n 100, 000 objects as collection of 5 normal distributions

Experimental Evaluation n n n 100, 000 objects as collection of 5 normal distributions each with 20, 000 objects Centers of clusters are uniformly distributed, deviation 0. 05 Queries follow same distribution but with deviations of 0. 1 and 1. 0 [1 <= number of queries <= 10, 000] Query size is 0. 01 x 0. 01 Max velocities: Zipf with overall Vmax=250 mph (1000 miles square space) 12/29/2021 Sunil Prabhakar 22

Reduction Rate VS. Query Density q = 10, 000 q = 1, 000 N

Reduction Rate VS. Query Density q = 10, 000 q = 1, 000 N = 100, 000 , m = 10, 000 Safe regions not recomputed! 12/29/2021 Sunil Prabhakar 23

Performance of Q-Index N = 100, 000, m = 10, 000, q = 1,

Performance of Q-Index N = 100, 000, m = 10, 000, q = 1, 000 Performance is almost two orders of magnitude better than traditional approaches. 12/29/2021 Sunil Prabhakar 24

Performance VS. number of objects moved m = 1, 000 m = 10, 000

Performance VS. number of objects moved m = 1, 000 m = 10, 000 N = 100, 000 , q = 1, 000 As the number of objects that move at each timestep increases, the I/O cost increases, until gracefully degrading to a sequential scan. 12/29/2021 Sunil Prabhakar 25

Q-index not main memory resident N = 100, 000 , q = 10, 000

Q-index not main memory resident N = 100, 000 , q = 10, 000 , m = 1000 Brute force outperforms Q-index in terms of I/O cost, but it pays a HIGH computation cost that offsets the reduced I/O. 12/29/2021 Sunil Prabhakar 26

Impact of CPU on Brute Force Q Max Index Sphere Rect m q Incremental

Impact of CPU on Brute Force Q Max Index Sphere Rect m q Incremental Brute Force 1000 10, 000 3. 6 s 1. 7 s 0. 9 s 0. 5 s 10, 000 37 s 3. 1 s 1. 3 s 1. 1 s 12/29/2021 Sunil Prabhakar 27

Impact of Velocity v = 125 mph v = 250 mph N = 100,

Impact of Velocity v = 125 mph v = 250 mph N = 100, 000, m = 10, 000 , q = 1, 000 A slight change in the effectiveness of the optimization when velocity increases 12/29/2021 Sunil Prabhakar 28

Impact of Density N = 100, 000, m = 1, 000 , q =

Impact of Density N = 100, 000, m = 1, 000 , q = 10, 000 1. 2. Reduce region to 10 x 10 miles, speed to 50 mph The safe region optimizations are less effective when density is increased. Q-Index approach is still an order of magnitude better than traditional approaches. 12/29/2021 Sunil Prabhakar 29

QIndex - performance n Advantages u Scales to large numbers of continuous queries u

QIndex - performance n Advantages u Scales to large numbers of continuous queries u Scales linearly with number of moving objects u Relatively insensitive to rate of movement u Safe Region optimizations are very effective n Disadvantage: u Sensitive 12/29/2021 to arrival of new queries! Sunil Prabhakar 30

Organization of talk n n n Introduction and motivation Related work Query indexing Velocity

Organization of talk n n n Introduction and motivation Related work Query indexing Velocity Constrained Indexing Conclusions 29 December 2021 Sunil Prabhakar 31

Velocity Constrained Indexing(VCI) n n n Maintain an index on moving objects Problem: ordinary/stationary

Velocity Constrained Indexing(VCI) n n n Maintain an index on moving objects Problem: ordinary/stationary index built on objects needs to be updates excessively as object moves. Main idea of VCI: reduce # of updates to index by exploiting limitations of object speed 12/29/2021 Sunil Prabhakar 32

VCI and Query Expansion n Each object has a known maximum speed VCI is

VCI and Query Expansion n Each object has a known maximum speed VCI is an R-tree based index with max velocity info Each node stores the max velocity over all objects that it covers. 12/29/2021 Sunil Prabhakar 33

MBR Expansion VCI build at time t 0 n At time t > t

MBR Expansion VCI build at time t 0 n At time t > t 0 it cannot be used, unless updated n Vmax fields allow us to use it! n No point moves farther R = vmax(t-t 0) n Solution: expand all MBRs by R n 12/29/2021 Sunil Prabhakar 34

Time tt 0 Example expansion MBR Qexp Q R R R Opt. : QE

Time tt 0 Example expansion MBR Qexp Q R R R Opt. : QE vs. MBRE 12/29/2021 Sunil Prabhakar 35

Post-processing False positives possible n Post–processing step to eliminate them n Post-processing obtains current

Post-processing False positives possible n Post–processing step to eliminate them n Post-processing obtains current positions of objects retrieved by Qexp n This can be expensive! n We propose several optimizations. n 12/29/2021 Sunil Prabhakar 36

1. Post-processing optimization n PP is needed for all objects that fall within the

1. Post-processing optimization n PP is needed for all objects that fall within the expanded queries. Note: no object moves farther than R = vmax(t-t 0) n n If circle C of radius R around an object X completely inside query Q, then it is not necessary to post-process X (for Q). 12/29/2021 Sunil Prabhakar 37

2. Post-processing optimization If not careful, PP can incur an I/O per matching object

2. Post-processing optimization If not careful, PP can incur an I/O per matching object 1. 2. 3. ID all objects that need PP, then retrieve objects once and check against all queries Sort objects on page number to avoid multiple retrievals of the same page Cluster index to reduce total # of pages to be retrieved 12/29/2021 Sunil Prabhakar 38

3. Clustered VCI n n n Clustering done efficiently after VCI creation Depth first

3. Clustered VCI n n n Clustering done efficiently after VCI creation Depth first traversal Each object copied to corresponding place in new file Pointers in leaf nodes adjusted Improves performance by a factor of 3 12/29/2021 Sunil Prabhakar 39

Problem: large expansions n n n Quality of index degrades with time Rebuild after

Problem: large expansions n n n Quality of index degrades with time Rebuild after some time (expensive) Refresh: update leaf nodes with current locations Depth first traversal and updating MBRs Retains old index structure Experimentally: works very well 12/29/2021 Sunil Prabhakar 40

Performance of VCI, No Clustering 12/29/2021 Sunil Prabhakar 41

Performance of VCI, No Clustering 12/29/2021 Sunil Prabhakar 41

Performance of VCI, With Clustering 12/29/2021 Sunil Prabhakar 42

Performance of VCI, With Clustering 12/29/2021 Sunil Prabhakar 42

Analysis n n n Pre-processing cost increases since queries getting larger Post-processing cost increases

Analysis n n n Pre-processing cost increases since queries getting larger Post-processing cost increases since more objects need to be processed Total cost approaches sequential scan after 150 time steps After that better to do sequential scan For clustered VCI this number is 400! 12/29/2021 Sunil Prabhakar 43

Impact of Refresh on VCI 12/29/2021 Sunil Prabhakar 44

Impact of Refresh on VCI 12/29/2021 Sunil Prabhakar 44

Refresh analysis n n Pre-processing reduced since MBRs better fit underlying data and clock

Refresh analysis n n Pre-processing reduced since MBRs better fit underlying data and clock is reset Post-processing reduced since index is “tighter” and less objects need to be processed 12/29/2021 Sunil Prabhakar 45

Sensitivity to Query Density 12/29/2021 Sunil Prabhakar 46

Sensitivity to Query Density 12/29/2021 Sunil Prabhakar 46

VCI - Sensitivity to parameters n Insensitive to u u u n Sensitive to

VCI - Sensitivity to parameters n Insensitive to u u u n Sensitive to u u n changes in set of queries (i. e. 100 queries replaced by other 100 queries) actual movement of objects (QE only, refresh is sensitive) portion of objects that moves (QE only, not refresh) # queries (good for roughly upto 100 queries) coverage of objects by queries Time scales linearly as function of Vmax (expansions are proportional, R=vmax(t-t 0) ) 12/29/2021 Sunil Prabhakar 47

Combined Scheme Both indexes are created and maintained n Qindex is used to process

Combined Scheme Both indexes are created and maintained n Qindex is used to process existing queries very efficiently n New queries are processed with VCI n When enough new queries arrive, bulk load into Qindex to amortize cost. n 12/29/2021 Sunil Prabhakar 48

Combined Index Schemes • Initially, N = 100, 000 m = 1, 000 q=10,

Combined Index Schemes • Initially, N = 100, 000 m = 1, 000 q=10, 000. • Queries come at a rate of 10 queries per 3 minutes New queries handled by VCI, when new queries reach a threshold (100), ingest the new queries into Q-Index. 12/29/2021 Sunil Prabhakar 49

Conclusion Qindex and VCI together offer a robust and scalable solution for efficient continuous

Conclusion Qindex and VCI together offer a robust and scalable solution for efficient continuous query evaluation n The performance is significantly better than traditional approaches n 12/29/2021 Sunil Prabhakar 50

Research Interests n Efficient I/O Management u u n Moving Object/Sensor DB u n

Research Interests n Efficient I/O Management u u n Moving Object/Sensor DB u n Quality-of-Service issues Security u n Scalability, Indexing, Querying, … Multimedia Databases u n Applications-specific (Multidimensional, Multimedia) Large-scale storage (Tertiary Storage) Intrusion Detection, Watermarking Smart Searching for Tooling 12/29/2021 Sunil Prabhakar 51

Reduction Rate VS. Number of Moving Objects m = 10, 000 m = 1,

Reduction Rate VS. Number of Moving Objects m = 10, 000 m = 1, 000 N = 100, 000 , q = 1, 000 12/29/2021 Sunil Prabhakar 52