Finding Top k Most Influential Spatial Facilities over

  • Slides: 28
Download presentation
Finding Top k Most Influential Spatial Facilities over Uncertain Objects School of Computer Science

Finding Top k Most Influential Spatial Facilities over Uncertain Objects School of Computer Science and Engineering Liming Zhan Ying Zhang Wenjie Zhang Xuemin Lin The University of New South Wales, Australia

Outline Ø Motivation Ø Problem Definition Ø Our Approach Ø Experiments Ø Conclusion 1

Outline Ø Motivation Ø Problem Definition Ø Our Approach Ø Experiments Ø Conclusion 1

Motivation example: NN, RNN, Influential Sites I(F 1)=1 I(F 2)=2 I(F 3)=0 I(F): influence

Motivation example: NN, RNN, Influential Sites I(F 1)=1 I(F 2)=2 I(F 3)=0 I(F): influence score of F, which is the number of objects influenced by F, namely, treat F as the NN. 2

Motivation Ø Warehouse Management Systems Ø RFID tags are attached to the items, whose

Motivation Ø Warehouse Management Systems Ø RFID tags are attached to the items, whose locations can be obtained by RFID readers Ø Find top k popular dispatching points. Ø Location Based Service (LBS) Ø Mobile to identify users’ location Ø Find the top k supermarkets which influence the largest number of users. 3

Influence Sites Ø Influence sets based on reverse nearest neighbor queries [SIGMOD 2000, Korn

Influence Sites Ø Influence sets based on reverse nearest neighbor queries [SIGMOD 2000, Korn et al. ] Ø On computing top-t most influential spatial sites (Tk. IS) [VLDB 2005, Xia et al. ] 4

Uncertainty exists Ø Uncertainty Ø RFID Reader: noisy Ø Location of mobile users: imprecise

Uncertainty exists Ø Uncertainty Ø RFID Reader: noisy Ø Location of mobile users: imprecise Ø Uncertain objects Ø Continuous: PDF Ø Discrete: multiple instances 5

Motivating example 6

Motivating example 6

Challenge Ø Uncertain model Ø Instances from an uncertain object may be influenced by

Challenge Ø Uncertain model Ø Instances from an uncertain object may be influenced by several facilities – How to model the query. Ø Efficiency of the algorithm Ø More complicated than that of traditional objects 7

Example [TKDE 2011, Zheng et al. ] 8

Example [TKDE 2011, Zheng et al. ] 8

Problem Statement Given a set of uncertain objects O and a set of facilities

Problem Statement Given a set of uncertain objects O and a set of facilities F, find the k facilities with the highest expected influence scores. 9

Naïve method Ø For each instance of an object, find the nearest facility f

Naïve method Ø For each instance of an object, find the nearest facility f and increase the influential score of f by the probability of the instance. Ø Return k facilities with highest scores. 10

Data Structure: Global R-tree Ø Global R-tree indexes the MBBs of all uncertain objects.

Data Structure: Global R-tree Ø Global R-tree indexes the MBBs of all uncertain objects. Ø MBB of an object is the minimum bounding box containing all its instances. Ø Each leaf is a MBB of an object in the global R-tree. 11

Data Structure: Local a. R-tree (Aggregate R-tree) Ø For each uncertain object, a local

Data Structure: Local a. R-tree (Aggregate R-tree) Ø For each uncertain object, a local a. R-tree is built to organize its multiple instance. Ø For every intermediate entry E in the local a. R-tree, the probability of E is the sum of probability of the instances considering E as an ancestor. P(E)=P(E 1)+P(E 2) 12

Framework Ø Filtering Ø Obtain tight lower and upper bounds for each facility and

Framework Ø Filtering Ø Obtain tight lower and upper bounds for each facility and prune unpromising facilities. Ø Process on global index - no object loaded. Ø Refinement Ø For each candidate facility, compute influence score based on local a. R-tree. 13

Filtering: Level by level RU: Objects R-tree RF: Facility R-tree ⋈ ⋈ ⋈ 14

Filtering: Level by level RU: Objects R-tree RF: Facility R-tree ⋈ ⋈ ⋈ 14

Filtering: upper bound of facility score min distance maxdist(F 1, E 1)< mindist(Fi, E

Filtering: upper bound of facility score min distance maxdist(F 1, E 1)< mindist(Fi, E 1) +(F ), I+(F ) ← number of objects in E Imaxdist(F 1 1 22, E 1)< mindist(Fi, E 1) 15

Filtering: lower bound of facility score max distance min distance maxdist(F 1, E 1)<

Filtering: lower bound of facility score max distance min distance maxdist(F 1, E 1)< mindist(F I-(F 1) ← number of objects 2, E in 1)E 1 maxdist(F 1, E 1)< mindist(F 3, E 1) 16

Filtering: get candidate Ø Sort facilities by lower bound in descending order Ø For

Filtering: get candidate Ø Sort facilities by lower bound in descending order Ø For top-K query Ø Compare the lower bound of the Kth facility with the upper bound of the following facilities Ø Get candidate facilities dataset 17

Refinement Ø For each candidate facility, traverse all the possible influenced objects a. R-tree

Refinement Ø For each candidate facility, traverse all the possible influenced objects a. R-tree to get the exact score. Ø Get the top k facilities with the highest influence scores. 18

U-Quadtree as global index EDBT 2012, Zhang et al. 19

U-Quadtree as global index EDBT 2012, Zhang et al. 19

Improvement by U-Quadtree Ø Filtering Ø U-Quadtree build summaries of objects based on Quadtree,

Improvement by U-Quadtree Ø Filtering Ø U-Quadtree build summaries of objects based on Quadtree, so we can get tighter upper and lower bounds to prune more objects. Ø Refinement Ø Use the leaf cell of U-Quadtree to intersect the entries of a. Rtree to reduce the search space. 20

Experiments Ø Algorithms: Ø Naïve: The naïve implementation Ø RTKIS: The technique based on

Experiments Ø Algorithms: Ø Naïve: The naïve implementation Ø RTKIS: The technique based on R-tree Ø UQuad. TKIS: The technique based on U-Quadtree Ø UTKIS: The technique presented in [TKDE 2011, Zheng et al. ] Ø Environment: Ø PC with Intel Xeon 2. 40 GHz dual CPU Ø 4 GB memory Ø Debian Linux Ø Disk page size is 4096 bytes 21

Experiments (Cont. ) Ø Real datasets ØCenter distribution: CA (62 k), US (200 k),

Experiments (Cont. ) Ø Real datasets ØCenter distribution: CA (62 k), US (200 k), R-tree-portal(21 K) ØNormalized to [0, 10000] Ø Parameters 22

Experiments (Cont. ) Expected Score VS Expected Rank – Result Comparison 23

Experiments (Cont. ) Expected Score VS Expected Rank – Result Comparison 23

Experiments (Cont. ) Impact of Data Distribution 24

Experiments (Cont. ) Impact of Data Distribution 24

Experiments (Cont. ) Varying m Varying #facilities 25 Varying ru Varying #objects

Experiments (Cont. ) Varying m Varying #facilities 25 Varying ru Varying #objects

Conclusion Ø We propose a new model to evaluate the influences of the facilities

Conclusion Ø We propose a new model to evaluate the influences of the facilities over a set of uncertain objects. Ø Efficient R-tree and U-Quadtree based algorithms are presented following the filtering and refinement paradigm. Ø Novel pruning techniques are proposed to significantly improve the performance of the algorithms by reducing the number of uncertain objects and facilities in the computation. Ø Comprehensive experiments demonstrate the effectiveness and efficiency of our techniques. 26

Thank you! Questions? 27

Thank you! Questions? 27