CSE 636 Data Integration Answering Queries Using Views

  • Slides: 17
Download presentation
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm

The Bucket Algorithm • Each subgoal g of Q must be “covered” by some

The Bucket Algorithm • Each subgoal g of Q must be “covered” by some view • Make a list of candidates (buckets) per query subgoal • Consider combinations of candidates from different buckets • Not all combos are “compatible” • Keep the compatible ones and minimize them • Discard the ones contained in another • Take their union 2

The Bucket Algorithm q(X, Y, R) : - For. Sale(X, Y, C, ”auto”), Review(X,

The Bucket Algorithm q(X, Y, R) : - For. Sale(X, Y, C, ”auto”), Review(X, R, ”auto”), Y > 1985 Step 1: For each subgoal, put the relevant sources into a bucket: V 1(name, year) : - For. Sale(name, year, “France”, “auto”), year > 1990 would be relevant V 3(name, year) : - For. Sale(name, year, “France”, “cheese”) would be irrelevant Step 2: Take the Cartesian product of the buckets Algorithm produces maximally contained rewriting Ignores interactions between subgoals in Step 1 3

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr),

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr), course(Crs, Title), Crs ≥ 500, Qtr ≥ Aut 98 V 2(Std, Prof, Crs, Qtr) : - reg(Std, Crs, Qtr), teaches(Prof, Crs, Qtr) V 3(Std, Crs) : - reg(Std, Crs, Qtr), Qtr ≤ Aut 94 V 4(Prof, Crs, Title, Qtr) : - reg(Std, Crs, Qtr), course(Crs, Title), teaches(Prof, Crs, Qtr), Qtr ≤ Aut 97 q(S, C, P) : - teaches(P, C, Q), reg(S, C, Q), course(C, T), C ≥ 300, Q ≥ Aut 95 Step 1: For each query subgoal, put the relevant sources into a bucket 4

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr),

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr), course(Crs, Title), Crs ≥ 500, Qtr ≥ Aut 98 V 2(Std, Prof, Crs, Qtr) : - reg(Std, Crs, Qtr), teaches(Prof, Crs, Qtr) V 3(Std, Crs) : - reg(Std, Crs, Qtr), Qtr ≤ Aut 94 V 4(Prof, Crs, Title, Qtr) : - reg(Std, Crs, Qtr), course(Crs, Title), teaches(Prof, Crs, Qtr), Qtr ≤ Aut 97 q(S, C, P) : - teaches(P, C, Q), reg(S, C, Q), course(C, T), C ≥ 300, Q ≥ Aut 95 Buckets teaches P Prof, C Crs, Q Qtr reg course V 2 V 4 Note: Arithmetic predicates don’t pose a problem 5

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr),

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr), course(Crs, Title), Crs ≥ 500, Qtr ≥ Aut 98 V 2(Std, Prof, Crs, Qtr) : - reg(Std, Crs, Qtr), teaches(Prof, Crs, Qtr) V 3(Std, Crs) : - reg(Std, Crs, Qtr), Qtr ≤ Aut 94 V 4(Prof, Crs, Title, Qtr) : - reg(Std, Crs, Qtr), course(Crs, Title), teaches(Prof, Crs, Qtr), Qtr ≤ Aut 97 q(S, C, P) : - teaches(P, C, Q), reg(S, C, Q), course(C, T), C ≥ 300, Q ≥ Aut 95 Buckets S Std, C Crs, Q Qtr teaches reg V 2 V 4 V 1 V 2 course Note: V 3 doesn’t work: arithmetic predicates not consistent V 4 doesn’t work: S not in the output of V 4 6

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr),

The Bucket Algorithm: Example V 1(Std, Crs, Qtr, Title) : - reg(Std, Crs, Qtr), course(Crs, Title), Crs ≥ 500, Qtr ≥ Aut 98 V 2(Std, Prof, Crs, Qtr) : - reg(Std, Crs, Qtr), teaches(Prof, Crs, Qtr) V 3(Std, Crs) : - reg(Std, Crs, Qtr), Qtr ≤ Aut 94 V 4(Prof, Crs, Title, Qtr) : - reg(Std, Crs, Qtr), course(Crs, Title), teaches(Prof, Crs, Qtr), Qtr ≤ Aut 97 q(S, C, P) : - teaches(P, C, Q), reg(S, C, Q), course(C, T), C ≥ 300, Q ≥ Aut 95 Buckets C Crs, T Title teaches reg course V 2 V 4 V 1 V 2 V 1 V 4 7

The Bucket Algorithm: Example Step 2: • Try all combos of views, one each

The Bucket Algorithm: Example Step 2: • Try all combos of views, one each from a bucket • Test satisfaction of arithmetic predicates in each case – e. g. , two views may not overlap, i. e. , they may be inconsistent • Desired rewriting = union of surviving ones Query rewriting 1: teaches reg course V 2 V 4 V 1 V 2 V 1 V 4 q 1(S, C, P) : - V 2(S’, P, C, Q), V 1(S, C, Q, T’), V 1(S”, C, Q’, T) – no problem from arithmetic predicates (none in V 2) – May or may not be minimal (why? ) 8

The Bucket Algorithm: Example Unfolding of rewriting 1: q 1’(S, C, P) : -

The Bucket Algorithm: Example Unfolding of rewriting 1: q 1’(S, C, P) : - r(S’, C, Q), t(P, C, Q), r(S, C, Q), c(C, T’), r(S”, C, Q’), c(C, T), C ≥ 500, Q ≥ Aut 98, C ≥ 500, Q’ ≥ Aut 98 • Black r’s can be mapped to green r: S’ S, S” S, Q’ Q • Black c can be mapped to green c: just extend above mapping to T T’ Minimized unfolding of rewriting 1: q 1 m’(S, C, P) : - t(P, C, Q), r(S, C, Q), c(C, T’), C ≥ 500, Q ≥ Aut 98 Minimized rewriting 1: q 1 m(S, C, P) : - V 2(S’, P, C, Q), V 1(S, C, Q, T’) 9

The Bucket Algorithm: Example Query Rewriting 2: teaches reg course V 2 V 4

The Bucket Algorithm: Example Query Rewriting 2: teaches reg course V 2 V 4 V 1 V 2 V 1 V 4 q 2(S, C, P) : - V 2(S’, P, C, Q), V 1(S, C, Q, T’), V 4(P’, C, T, Q’) q 2’(S, C, P) : - r(S’, C, Q), t(P, C, Q), r(S, C, Q), c(C, T’), C ≥ 500, Q ≥ Aut 98, r(S”, C, Q’), c(C, T), t(P’, C, Q’), Q’ ≤ Aut 97 • This combo is infeasible: consider the conjunction of arithmetic predicates in V 1 and V 4 Query rewriting 3: teaches reg course V 2 V 4 V 1 V 2 V 1 V 4 q 3(S, C, P) : - V 2(S’, P, C, Q), V 2(S, P’, C, Q), V 4(P”, C, T, Q’) 10

The Bucket Algorithm: Example Unfolding of rewriting 3: q 3’(S, C, P) : -

The Bucket Algorithm: Example Unfolding of rewriting 3: q 3’(S, C, P) : - r(S’, C, Q), t(P, C, Q), r(S, C, Q), t(P’, C, Q), r(S”, C, Q’), c(C, T), t(P”, C, Q’), Q’ ≤ Aut 97 • The green subgoals can cover the black ones under the mapping: S’ S, S” S, P’ P, P” P, Q’ Q Minimized rewriting 3: q 3 m(S, C, P) : - V 2(S, P, C, Q), V 4(P, C, T, Q) Verify that there are only two rewritings that are not covered by others Maximally Contained Rewriting: q’ = q 1 m q 3 m 11

The Bucket Algorithm: Example 2 Query: q(X) : - cites(X, Y), cites(Y, X), same.

The Bucket Algorithm: Example 2 Query: q(X) : - cites(X, Y), cites(Y, X), same. Topic(X, Y) Views: V 4(A) : - cites(A, B), cites(B, A) V 5(C, D) : - same. Topic(C, D) V 6(F, H) : - cites(F, G), cites(G, H), same. Topic(F, G) Buckets cites same. Topic V 4 V 5 V 6 V 6 Note: Should we list V 4(X) twice in the buckets? 12

The Bucket Algorithm: Example 2 • Consider all combos & check for containment of

The Bucket Algorithm: Example 2 • Consider all combos & check for containment of the unfolded rewriting in Q • V 4(X) cannot be combined with anything (why? ) Try q 1(X) : - V 4(X), V 5(X, Y) Try q 2(X) : - V 4(X), V 6(X, Y), V 5(X, Y) • Does any of these work? • When can we discard a view from consideration? 13

The Bucket Algorithm: Example 2 Here is a successful rewriting: q 3(X) : -

The Bucket Algorithm: Example 2 Here is a successful rewriting: q 3(X) : - V 6(X, Y), V 6(X, Y) • By itself is not contained in Q • But, with subgoal X=Y added, it is! By minimizing the rewriting, we get: q 3 m(X, Y) : - V 6(X, X) 14

The Bucket Algorithm: Example 2 Remarks: • V 4 didn’t contribute to any rewrite,

The Bucket Algorithm: Example 2 Remarks: • V 4 didn’t contribute to any rewrite, but the bucket algorithm doesn’t recognize it ahead • Consider: q 2(X, Y) : - cites(X, Y), cites(Y, X) • Then both cites predicates can be folded into V 4 – Not recognized by the bucket algorithm 15

The State of Affairs • Bucket algorithm: – deals well with predicates, Cartesian product

The State of Affairs • Bucket algorithm: – deals well with predicates, Cartesian product can be large (containment check required for every candidate rewriting) • Inverse rules: – modular (extensible to binding patterns, FD’s) – no treatment of predicates – resulting rewritings need significant further optimization Neither scales up • The MINICON algorithm: – change perspective : look at query variables 16

References • Querying Heterogeneous Information Sources Using Source Descriptors – By Alon Y. Levy,

References • Querying Heterogeneous Information Sources Using Source Descriptors – By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille – VLDB, 1996 • Laks VS Lakshmanan – Lecture Slides • Alon Halevy – Answering Queries Using Views: A Survey – VLDB Journal, 2000 – http: //citeseer. ist. psu. edu/halevy 00 answering. html 17