On Computing Topt Most Influential Spatial Sites Tian

  • Slides: 43
Download presentation
On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang

On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA 9/2/2005 VLDB 2005, Trondheim, Norway 1

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p Data Structures and Algorithm p Experimental Results p Conclusions p 9/2/2005 VLDB 2005, Trondheim, Norway 2

Problem Definition p Given: n n p Top-t most influential sites query: n p

Problem Definition p Given: n n p Top-t most influential sites query: n p a set of sites S a set of weighted objects O a spatial region Q an integer t. find t sites in Q with the largest influences. influence of a site s = total weight of objects that consider s as the nearest site. 9/2/2005 VLDB 2005, Trondheim, Norway 3

Motivation p Which supermarket in Boston is the most influential among residential buildings? n

Motivation p Which supermarket in Boston is the most influential among residential buildings? n n p Sites: supermarkets; Objects: residential buildings; Weight: # people in a building; Query region: Boston; Which wireless station in Boston is the most influential among mobile users? 9/2/2005 VLDB 2005, Trondheim, Norway 4

Example o 2 o 1 s 2 o 4 s 3 o 5 s

Example o 2 o 1 s 2 o 4 s 3 o 5 s 4 o 3 o 6 Suppose all objects have weight = 1, Q is the whole space, and t = 1. p The most influential site is s 1, with influence = 3. p 9/2/2005 VLDB 2005, Trondheim, Norway 5

Example o 2 o 1 s 2 o 4 s 3 o 5 s

Example o 2 o 1 s 2 o 4 s 3 o 5 s 4 o 3 o 6 Now that Q is the shadowed rectangle and t = 2. p Top-2 most influential sites: s 4 and s 2. p 9/2/2005 VLDB 2005, Trondheim, Norway 6

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p Data Structures and Algorithm p Experimental Results p Conclusions p 9/2/2005 VLDB 2005, Trondheim, Norway 7

Related Work Bi-chromatic RNN query: considers two datasets, sites and objects. p The RNNs

Related Work Bi-chromatic RNN query: considers two datasets, sites and objects. p The RNNs of a site s S are the objects that consider s as the nearest site. p o 2 o 1 s 1 9/2/2005 s 2 o 4 s 3 o 5 s 4 o 3 VLDB 2005, Trondheim, Norway o 6 8

Related Work p Solutions to the RNN query based on precomputation [KM 00, YL

Related Work p Solutions to the RNN query based on precomputation [KM 00, YL 01]. o 2 o 1 s 1 9/2/2005 s 2 o 4 s 3 o 5 s 4 o 3 VLDB 2005, Trondheim, Norway o 6 9

Related Work p Solution to RNN query based on Voronoi diagram [SRAE 01]. n

Related Work p Solution to RNN query based on Voronoi diagram [SRAE 01]. n n 9/2/2005 Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. Querying the object R-tree using the Voronoi cell. VLDB 2005, Trondheim, Norway 10

Related Work [SRAE 01] o 2 o 1 s 1 9/2/2005 s 2 o

Related Work [SRAE 01] o 2 o 1 s 1 9/2/2005 s 2 o 4 s 3 o 5 s 4 o 3 VLDB 2005, Trondheim, Norway o 6 11

Our Problem vs. RNN Query p RNN query: n n p A single site

Our Problem vs. RNN Query p RNN query: n n p A single site as an input. Interested in the actual set of the RNNs. Top-t most influential sites query: n n 9/2/2005 A spatial region as an input. Interested in the aggregate weight of RNNs. VLDB 2005, Trondheim, Norway 12

Straightforward Solution 1 For each site, pre-compute its influence. p At query time, find

Straightforward Solution 1 For each site, pre-compute its influence. p At query time, find the sites in Q and return the t sites with max influences. p Drawback 1: Costly maintenance upon updates. p Drawback 2: binding a set of sites closely with a set of objects. p 9/2/2005 VLDB 2005, Trondheim, Norway 13

Straightforward Solution 2 p An extension of the Voronoi diagram based solution to the

Straightforward Solution 2 p An extension of the Voronoi diagram based solution to the RNN query. 1. 2. 3. 9/2/2005 Find all sites in Q. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. Return the t sites with max influences. VLDB 2005, Trondheim, Norway 14

Straightforward Solution 2 p Drawback 1: All sites in Q need to be retrieved

Straightforward Solution 2 p Drawback 1: All sites in Q need to be retrieved from the leaf nodes. p Drawback 2: The object R-tree and the site R-tree are browsed multiple times. n n 9/2/2005 For each site in Q, browse the site R-tree to compute the Voronoi Cell. For each such Voronoi Cell, browse the object R-tree to compute the influence. VLDB 2005, Trondheim, Norway 15

Features of Our Solution Systematically browse both trees once. p Pruning techniques are provided

Features of Our Solution Systematically browse both trees once. p Pruning techniques are provided based on a new metric, min. Exist. DNN. p No need to compute the influences for all sites in Q, or even to locate all sites in Q. p 9/2/2005 VLDB 2005, Trondheim, Norway 16

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p Data Structures and Algorithm p Experimental Results p Conclusions p 9/2/2005 VLDB 2005, Trondheim, Norway 17

Motivation p p Intuitively, if some object in Oi may consider some site in

Motivation p p Intuitively, if some object in Oi may consider some site in Sj as an NN, Oi affects Sj. To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affect Sj. O 1 O 2 S 1 S 2 O 1 only affects S 1, while O 2 affects both S 1 and S 2. 9/2/2005 VLDB 2005, Trondheim, Norway 18

max. Dist – A Loose Estimation If max. Dist(O 1, S 1) < min.

max. Dist – A Loose Estimation If max. Dist(O 1, S 1) < min. Dist(O 1, S 2), O 1 does not affect S 2. p Why not good enough? p min. Dist(O 1, S 2)=8 S 2 O 1 S 1 9/2/2005 max. Dist(O 1, S 1)=10 VLDB 2005, Trondheim, Norway 19

min. Max. Dist – A Tight Estimation? min. Dist(o 1, S 2) = 6

min. Max. Dist – A Tight Estimation? min. Dist(o 1, S 2) = 6 S 1 o 1 S 2 min. Max. Dist(o 1, S 1) = 5 p An object o does not affect S 2, if there exists S 1 such that min. Max. Dist(o 1, S 1) < min. Dist(o 1, S 2) 9/2/2005 VLDB 2005, Trondheim, Norway 20

min. Max. Dist – A Tight Estimation? min. Dist(O 1, S 2) = 6

min. Max. Dist – A Tight Estimation? min. Dist(O 1, S 2) = 6 s 1 S 2 O 1 7 6 o 1 s 2 min. Max. Dist(O 1, S 1) = 5 p Not true for an object MBR O 1. 9/2/2005 VLDB 2005, Trondheim, Norway 21

A Tight Estimation? p A metric m(O 1, S 1) should: 1) 2) 9/2/2005

A Tight Estimation? p A metric m(O 1, S 1) should: 1) 2) 9/2/2005 guarantee that, each location in O 1 is within m(O 1, S 1) of a site in S 1, and be the smallest distance with this property. VLDB 2005, Trondheim, Norway 22

New Metric – min. Exist. DNNS 1(O 1) Definition: min. Exist. DNNS 1(O 1)

New Metric – min. Exist. DNNS 1(O 1) Definition: min. Exist. DNNS 1(O 1) = max {min. Max. Dist(l, S 1) | location l O 1} p p O 1 does not affect S 2, if there exists S 1, s. t. min. Exist. DNNS 1(O 1) < min. Dist(O 1, S 2). 9/2/2005 VLDB 2005, Trondheim, Norway 23

Examples of min. Exist. DNNS 1(O 1) O 1 S 1 p S 1

Examples of min. Exist. DNNS 1(O 1) O 1 S 1 p S 1 How to calculate it? 9/2/2005 VLDB 2005, Trondheim, Norway 24

Calculating min. Exist. DNNS 1(O 1) p Step 1: Space partitioning P 1: b

Calculating min. Exist. DNNS 1(O 1) p Step 1: Space partitioning P 1: b P 2: c P 3: a a P 4: d c S 1 b P 8: a 9/2/2005 d P 7: d P 6: b Every location l in the same partition is associated with the second closest corner of S 1 – the distance is min. Max. Dist(l, S 1)! P 5: c VLDB 2005, Trondheim, Norway 25

Space Partitioning p O 1 is divided into multiple sub-regions, one in each partition.

Space Partitioning p O 1 is divided into multiple sub-regions, one in each partition. P 1: b P 2: c O 1 a c S 1 b 9/2/2005 d VLDB 2005, Trondheim, Norway 26

Calculating min. Exist. DNNS 1(O 1) p p Step 2: Choose up-to 8 locations

Calculating min. Exist. DNNS 1(O 1) p p Step 2: Choose up-to 8 locations on O 1’ border and compute the min. Max. Dist’s to S 1. min. Exist. DNN is the largest one! P 1: b P 2: c O 1 min. Exist. DNNS 1(O 1) a c S 1 b 9/2/2005 VLDB 2005, Trondheim, Norway d 27

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p Data Structures and Algorithm p Experimental Results p Conclusions p 9/2/2005 VLDB 2005, Trondheim, Norway 28

Data Structure Two R-trees: S of sites, O of objects. p Three queues: p

Data Structure Two R-trees: S of sites, O of objects. p Three queues: p n n n 9/2/2005 queue. SIN: entries of S inside Q. queue. SOUT: entries of S outside Q. queue. O: entries of O. VLDB 2005, Trondheim, Norway 29

Data Structure S 3 O 2 O 1 S 1 Q S 4 S

Data Structure S 3 O 2 O 1 S 1 Q S 4 S 2 p p p O 3 O 4 queue. SIN: S 1 S 2 queue. O: O 1 queue. SOUT: S 3 9/2/2005 VLDB 2005, Trondheim, Norway 30

max. Influence and min. Influence p For each entry Sj in queue. SIN, n

max. Influence and min. Influence p For each entry Sj in queue. SIN, n n p max. Influence: total weight of entries in queue. O that affect Sj. min. Influence: total weight of entries in queue. O that ONLY affect Sj, divided by the number of objects in Sj. queue. SIN is sorted in decreasing order of max. Influence. 9/2/2005 VLDB 2005, Trondheim, Norway 31

Algorithm Overview p Expand an entry from one of the three queues. n n

Algorithm Overview p Expand an entry from one of the three queues. n n n p Remove the entry from the queue. Retrieve the referenced node, and insert the (unpruned) entries into the same queue. Update max. Influence and min. Influence if necessary. If top-t entries in queue. SIN are sites, with min. Influences ≥ max. Influences of all remaining entries, return. 9/2/2005 VLDB 2005, Trondheim, Norway 32

Example S 3 O 5 S 8 p S 9 p p O 6

Example S 3 O 5 S 8 p S 9 p p O 6 O 1 S 5 S 1 Q S 6 S 7 p p p queue. SIN: S 1 queue. O: O 1 queue. SOUT: S 3 queue. SIN: S 5, S 7 queue. O: O 6 queue. SOUT: S 9 S 6 is not affected by O 1, prune S 6. O 5 does not affect S 5 and S 7, prune O 5. 9/2/2005 VLDB 2005, Trondheim, Norway 33

A Pruning Case min. Exist. DNNS 1(O 1)=6 min. Dist(S 2, O 1)=5 S

A Pruning Case min. Exist. DNNS 1(O 1)=6 min. Dist(S 2, O 1)=5 S 4 Expand S 1 S 2 O 1 S 3 min. Exist. DNNS 3(O 1)=4 p S 2 is pruned because of min. Exist. DNNS 3(O 1) < min. Dist(S 2, O 1) 9/2/2005 VLDB 2005, Trondheim, Norway 34

Choosing an Entry to Expand p Expand top entries in queue. SIN. p Expand

Choosing an Entry to Expand p Expand top entries in queue. SIN. p Expand the most important Oi. n p Importance: |Oi| * #affected entries * area(Oi) Expand Sj that contains the most important Oi. 9/2/2005 VLDB 2005, Trondheim, Norway 35

Choosing an Entry to Expand p Estimate the probability of pruning Oi using some

Choosing an Entry to Expand p Estimate the probability of pruning Oi using some Sj in queue. SOUT. Q Q S 1 min. Dist(S 1, O 1)=5 min. Exist. DNNS 2(O 1)=6 p S 1 O 1 min. Dist(S 1, O 1)=5 O 1 S 2 min. Exist. DNNS 2(O 1)=6 S’ 2 After expanding S 2, O 1 is likely not to affect S 1. 9/2/2005 VLDB 2005, Trondheim, Norway 36

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p Data Structures and Algorithm p Experimental Results p Conclusions p 9/2/2005 VLDB 2005, Trondheim, Norway 37

Experimental Setup p Data sets: n n 24, 493 populated places in North America

Experimental Setup p Data sets: n n 24, 493 populated places in North America 9, 203 cultural landmarks in North America R-tree page size: 1 KB p LRU buffer: 128 disk pages. p t = 4. p p Comparing to the solution using Voronoi diagram. 9/2/2005 VLDB 2005, Trondheim, Norway 38

Selected Experimental Results #sites : #objects = 1 : 2. 5 9/2/2005 VLDB 2005,

Selected Experimental Results #sites : #objects = 1 : 2. 5 9/2/2005 VLDB 2005, Trondheim, Norway 39

Selected Experimental Results #sites : #objects = 2. 5 : 1 9/2/2005 VLDB 2005,

Selected Experimental Results #sites : #objects = 2. 5 : 1 9/2/2005 VLDB 2005, Trondheim, Norway 40

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p

Outline Problem Definition p Related Work p The New Metric: min. Exist. DNN p Data Structures and Algorithm p Experimental Results p Conclusions p 9/2/2005 VLDB 2005, Trondheim, Norway 41

Conclusions We addressed a new problem: Top-t most influential sites query. p We proposed

Conclusions We addressed a new problem: Top-t most influential sites query. p We proposed a new metric: min. Exist. DNN. It can be used to prune search space in NN/RNN related problems. p We carefully designed an algorithm which systematically browses both R-trees once. p Experiments showed more than an order of magnitude improvement. p 9/2/2005 VLDB 2005, Trondheim, Norway 42

Thank you! Q&A 9/2/2005 VLDB 2005, Trondheim, Norway 43

Thank you! Q&A 9/2/2005 VLDB 2005, Trondheim, Norway 43