Collective Spatial Keyword Queries A Distance OwnerDriven Approach
- Slides: 36
Collective Spatial Keyword Queries: A Distance Owner-Driven Approach Cheng Long, Raymond Chi-Wing Wong: The Hong Kong University of Science and Technology Ke Wang: Simon Fraser University Ada Wai-Chee Fu: The Chinese University of Hong Kong 1 Prepared by Cheng Long Presented by Cheng Long 1 July, 2013
Outline n n n n 2 Introduction Contribution Problem Definition Max. Sum-Co. SKQ Dia-Co. SKQ Experimental Results Conclusion
Introduction: Spatial-textual data n n Spatial + textual Some examples n Spatial points of interest n n Geo-tagged web objects n n 3 E. g. , Geo-tagged photos at Flicker and geo-tagged docs. Geo-social networking data n n E. g. , restaurants, shopping malls and hotels. … E. g. , In Four. Square, each user has its checking-in history and profile.
Introduction: Spatial Keyword Queries n n n Data: A set of spatial-textual objects Input: a query location and several query keywords Query goals: Spatially close & textually similar n Spatial Keyword Top-k query / Reverse top-k query n n Spatial Keyword k. NN query n n 4 Keyword covering constraint. Spatial Keyword Range query n n The score function … Keyword covering constraint.
Introduction: Collective Spatial Keyword The query Queries keywords are diverse. n The no. of query Spatial Keyword k. NN query / range query keywords is large. n n n A single object covers all the keywords. Not always possible! Collective Spatial Keyword Query (Co. SKQ) n n By Cao et al. SIGMOD’ 11 It finds a set of objects that n n n covers the query keywords collectively; has the smallest cost. Cost Functions n n 5 … Linear. Sum: Max. Sum: Linear. Sum-Co. SKQ Adequately solved! NP-hard! Max. Sum-Co. SKQ Cao-Exact: Scalability issues!
N 1 N 2, N 3 N 2 The query keywords are t 1, t 2. Each inner node covers both t 1, t 2. N , N , Introduction: Motivation N N N Enumeration n 4 4 5 5 6 6 7 8 7 8 o 5, o 10 o 5, oo Cao-Exact. o 6, o 10 Best-first search algorithm based on IR-tree. Not scalable! 8 M objects, 6 query keywords: more than 10 days! n n n IF 2 IF 3 IF 1 t 1: N 2, N 3 t 2: N 3 … N 1 N 2 N 4 IF 4 6 o 6, o 9 N 5 N 3 N 6 IF 5 IF 6 N 8 N 7 IF 7 IF 8 t 1: o 2, o 3 t 2: o 3 …
Outline n n n n 7 Introduction Contribution Problem Definition Max. Sum-Co. SKQ Dia-Co. SKQ Experimental Results Conclusion
Contributions n 8
Outline n n n n 9 Introduction Contribution Problem Definition Max. Sum-Co. SKQ Dia-Co. SKQ Experimental Results Conclusion
Problem Definition (1) n q: the query n n n O: the set of spatial objects, each has n n n A location A set of keywords Relevant object Collective Spatial Keyword Query (Co. SKQ): Find a set S of objects such that n n 10 A location A set of keywords S covers the set of query keywords; S is feasible the cost of S, denoted by cost(S) (defined later), is the smallest.
Problem Definition (2) n Max. Sum Cost n linear combination of two max components n n n cost. Max. Sum(S) = a * max(S, q) + (1 -a) * max(S, S) n Following the convention, we set a = 0. 5 by default. n cost. Max. Sum(S) = max(S, q) + max(S, S) Diameter Cost n n n 11 max(S, q) and max(S, S) max(S, q) vs. max(S, S) Use a “max” operation! cost. Dia(S) = max{max(S, q), max(S, S)}
Outline n n Introduction Contribution Problem Definition Max. Sum-Co. SKQ n n n 12 Finding Optimal Solution: Max. Sum-Exact Finding Approximate solution: Max. Sum-Appro Dia-Co. SKQ Experimental Results Conclusion
Cost function: Cost(S) = max(S, q) + max(S, S) Max. Sum-Co. SKQ: Finding Optimal Solutions (1) Query distance owner max(S, q) n Some basic concepts n n n Query distance owner Pairwise distance owner Distance owner group (o, o 1, o 2)-consistency Pairwise Key observation n distance owner Pairwise distance owner One “distance owner group” usually corresponds to many feasible sets! 13 max(S, S) … Same distance owner group (o, o 1, o 2)
Collective Spatial Keyword Query (Co. SKQ): Find a set S of objects such that n S is feasible; n the cost of cost(S) is minimized. Max. Sum-Co. SKQ: Finding Optimal Solutions (2) The size is exponential in terms of the number of relevant objects! Feasible set space Cao-Exact Distance ownerdriven approach S 1 (, , )1 S 2 … S 3 (, , )2 … Sn (, , )m Distance owner group space The size is cubic in terms of the number of relevant objects. 14 Search directly!
A subset of the triplet space Max. Sum-Co. SKQ: Finding Optimal Feasible set space Solutions (2) A distance owner-driven approach Maintain a best-known feasible set S For each triplet (o, o 1, o 2) If there exists a feasible set S’ which is (o, o 1, o 2)-consistent then Issue 1 Issue 2 S S’ if cost(S’) < cost(S) Return S n n … Sn … (, , )m Distance owner group space (, , )1 (, , )2 A straightforward one checks cubic candidates! Pruning! Issue 2: How to check for a triplet (o, o 1, o 2) whethere exists a feasible set S’ which is (o, o 1, o 2)-consistent? n 15 S 3 Issue 1: How to search over the “triplet” space? n n S 2 S 1 Should be efficient!
A distance owner-driven approach Maintain a best-known feasible set S For each triplet (o, o 1, o 2) If there exists a feasible set S’ which is (o, o 1, o 2)-consistent then S S’ if cost(S’) < cost(S) Return S Issue 1: How to search over the “triplet” space? n Not all relevant objects need to be considered as the candidates of the query distance o. n o cannot be too close to q. n n Lower bound of d(o, q) ≥ rmin = d(of, q), of is the farthest keyword NN from q. Objects that are too far away from q can be ignored. n Upper bound of d(o, q) n d(o, q) ≤ rmax = cost(S) A “ring” region, R(S).
A distance owner-driven approach Maintain a best-known feasible set S For each triplet (o, o 1, o 2) If there exists a feasible set S’ which is (o, o 1, o 2)-consistent then S S’ if cost(S’) < cost(S) Return S Issue 1: How to search over the “triplet” space? n Once the candidate of the query distance owner, says o, is fixed, the pairwise distance owners o 1 and o 2 are constrained. n n Restricted in Disk(q, d(o, q))! d(o 1, o 2) cannot be too small! n n d(o 1, o 2) ≥ dmin = d(o, q) – min{d(o 1, q), d(o 2, q)} triangle inequality Those with large d(o 1, o 2) can be pruned! n n n 17 Lower bound of d(o 1, o 2): Upper bound of d(o 1, o 2) ≥ dmax = cost(S) – d(o, q) Best-known solution S
A distance owner-driven approach Maintain a best-known feasible set S For each triplet (o, o 1, o 2) If there exists a feasible set S’ which is (o, o 1, o 2)-consistent then S S’ if cost(S’) < cost(S) Return S Issue 1: How to search over the “triplet” space? n Candidates of o: n n n Ring region R(S) Ascending order of the distances from q. For each candidate of o, the candidates of o 1 and o 2: n Disk(q, d(o, q)) The ring shrinks progressively! For the pairwise distance owner o 1, o 2: Lower bound of d(o 1, o 2) ≥ dmin = d(o, q) – min{d(o 1, q), d(o 2, q)} Upper bound of d(o 1, o 2) ≤ dmax = cost(S) – d(o, q) 18
A distance owner-driven approach Maintain a best-known feasible set S 1 For each triplet (o, o 1, o 2) If there exists a feasible set S’ which is (o, o 1, o 2)-consistent then S S’ if cost(S’) < cost(S) Return S Issue 2: How to check for a triplet (o, o , o 2) whethere exists a feasible set S’ Issue 2 which is (o, o 1, o 2)-consistent? n Restrictions on S’ (if it exists) n n n n D(q, d(o, q)) Exhaustive search for S’ in the intersection of the three disks with the above restrictions! Inverted file could n n 19 d(o 1, o 2) ≥ d(o, o 1) d(o 1, o 2) ≥ d(o, o 2) S’ is inside Disk(o, d(o, q)) S’ is inside Disk(o 1, d(o 1, o 2)) S’ is inside Disk(o 2, d(o 1, o 2)) S’ covers the query keywords. D(o 1, d(o 1, o 2)) D(o 2, d(o 1, o 2)) If it succeeds, return S’; Otherwise, we know that S’ does not exist! be utilized here. With the two issues fixed, Max. Sum-Exact is complete!
Outline n n Introduction Contribution Problem Definition Max. Sum-Co. SKQ n n n 20 Finding Optimal Solution: Max. Sum-Exact Finding Approximate solution: Max. Sum-Appro Dia-Co. SKQ Experimental Results Conclusion
Max. Sum-Co. SKQ: Finding Approximate Solution (1) Constrained NN n o-neighborhood feasible set n n The set containing all Disk(o, d(o, q))-constrained keyword t. NN for each query keyword t. E. g. , o 3 -neighborhood feasible set n n n For t 1: Disk(o 3, d(o 3, q))-constrained keyword t 1 -NN is o 2. For t 2: Disk(o 3, d(o 3, q))-constrained keyword t 2 -NN is o 5. For t 3: Disk(o 3, d(o 3, q))-constrained keyword t 3 -NN is o 3. 21 - region - keyword o 3 -neighborhood feasible set is {o 2, o 3, o 5}.
Max. Sum-Co. SKQ: Finding Approximate Solution (1) The costly part 22 A distance owner-driven approach Algorithm: Max. Sum-Appro Maintain a best-known feasible set S For each triplet (o, o 1, o 2) For each relevant object o in R(S) If there exists a feasible set S’ S’ the o-neighborhood feasible set which is (o, o 1, o 2)-consistent then S S’ if cost(S’) < cost(S) Return S
A distance owner-driven approach Algorithm: Max. Sum-Appro Maintain a best-known feasible set S For each triplet (o, o 1, o 2) For each relevant object o in R(S) If there exists a feasible set S’ S’ the o-neighborhood feasible set which is (o, o 1, o 2)-consistent then S S’ if cost(S’) < cost(S) Return S Max. Sum-Co. SKQ: Finding Approximate Solution (2) n Approximation bound n n n Time complexity n n 23 Max. Sum-Appro is a 1. 375 -factor approximation. Refer to our paper for the proof if you are interested. O(nr* |q| * log |O|) It has the same as the worst-case time complexity as Cao. Appro 2, but a smaller approximation factor (1. 375 -factor vs. 2 -factor).
Outline n n n n 24 Introduction Contribution Problem Definition Max. Sum-Co. SKQ Dia-Co. SKQ Experimental Results Conclusion
Dia-Co. SKQ (1): Finding Exact Solutions n cost. Dia(S) = max{max(S, q), max(S, S)} n n max(S, q): determined by the query distance owner max(S, S): determined by the pairwise distance owners Dominated by the “distance owner group” of S We can apply the distance owner-driven approach to the Dia-Co. SKQ problem! n with several updates. Pairwise Distance Owner o 1, o 2: Lower bound of d(o 1, o 2) ≥ dmin = d(o, q) – min{d(o 1, q), d(o 2, q)} d(o, q) Upper bound of d(o 1, o 2) ≤ dmax = cost(S) – d(o, q) cost(S) 25
Dia-Co. SKQ (2): Finding Approximate Solution n 26
Dia-Co. SKQ (3): Adaptions of Existing Solutions n 27
Outline n n n n 28 Introduction Contribution Problem Definition Max. Sum-Co. SKQ Dia-Co. SKQ Experimental Results Conclusion
No. of objects GN Web Hotel 1, 868, 821 579, 727 20, 790 2, 899, 175 602 249, 132, 88 80, 845 Experimental Results: Set-Up No. of unique 222, 409 words n Datasets: n n n 29 GN, Web and Hotel (the same datasets as used by Cao et al. ) Location and query keywords Algorithms n n 18, 374, 228 Query Generation n n No. of words Max. Sum-Co. SKQ: Cao-Exact, Cao-Appro 1, Cao-Appro 2, Max. Sum-Exact, Max. Sum-Appro Dia-Co. SKQ: Cao-Exact, Cao-Appro 1, Cao-Appro 2, Dia. Exact, Dia-Appro Factors & Measures n No. of query keywords and no. of average keywords contained by an object
Experimental Results: Performance Study Max. Sum-Exact runs faster than Cao-Exact (1) by up to 3 orders of magnitude. n n n 30 Problem: Max. Sum-Co. SKQ Our Max. Sum-Appro runs fast and is Dataset: Web comparable with Cao-Appro 2. Factor: |q. �� | Our Max. Sum-Appro returns near-to-optimal solution.
Experimental Results: Performance Study (2) n n n 31 Problem: Max. Sum-Co. SKQ |. Dataset: Web Cao-Exact is not scalable wrt |o. �� Our Max. Sum-Exact is scalable wrt | o. �� Factor: |o. �� | |.
Experimental Results: Performance Study (3) Cao-Exact runs more than 10 days when the data size is abut 8 millions! n n n 32 Problem: Max. Sum-Co. SKQ and Dia-Co. SKQ Max. Sum-Exact is still fast (≤ 100 s) Dataset: GN when the data size is millions. Scalability test. Max. Sum-Appro runs in real time (≤ 1 s).
Outline n n n n 33 Introduction Contribution Problem Definition Max. Sum-Co. SKQ Dia-Co. SKQ Experimental Results Conclusion
Conclusion n n 34 Collective Spatial Keyword Query problem A distance owner-driven approach. Exact and approximate algorithms. Extensive experiments.
My research interest n Databases Queries and/or Data Mining on n Spatial data n n Spatial-textual data n n E. g. , viral marketing [ICDM’ 11] Graph n 35 E. g. , trajectory compression [VLDB’ 13] Social network data n n E. g. , spatial keyword query [SIGMOD’ 13] Trajectory data n n E. g. , spatial matching [SIGMOD’ 13] E. g. , shortest path queries etc.
Q & A 36
- Spatial data vs non spatial data
- Boosterism approach to tourism planning
- Basic retrieval queries in sql
- Standing queries are
- Jafar shah
- Wideworldimporters
- Canned queries
- Wildcard queries in information retrieval
- Texas railroad commission online queries
- Scdl exam centres
- Hotel.hotelno=room.hotelno(hotel room)
- Action queries in access
- J queries
- Join ordering in fragment queries
- Thank you any queries
- Ssms intellisense not working
- Sql queries
- Sql queries for banking database
- Sql queries for insert update and delete
- Principles of dimensional modeling
- Disadvantages of eye gaze communication system
- Tpch queries
- Complex sql join queries
- Using subqueries to solve queries
- Suggestions and queries
- Answering my queries
- Conjunctive queries
- Multirelation queries
- Nnn ir
- Rrc completions query
- For any queries
- The ratio of input distance to output distance
- What is the distance between distance and displacement
- Theoretical models of counseling
- Approach research meaning
- Waterfall and shower strategy
- Diagram for traditional approach