Conjunctions of Queries 1 Conjunctive Queries A conjunctive

• Slides: 17

Conjunctions of Queries 1

Conjunctive Queries • A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated atoms and no comparisons) • A conjunctive query has only EDB in its body • We say that a query Q 1 is contained in a query Q 2 if for all databases D, the result of computing Q 1 on D is contained in the result of computing Q 2 on D. 2

Example • • Consider the queries Q 1: p(X, Y) : - e(X, Z), e(Z, Y) Q 2: p (X, Y) : - e(X, Z), e(Z, Y), e(X, W) It is easy to see that Q 2 is contained in Q 1 since any mapping that satisfies Q 2 also satisfies Q 1 Can you prove that Q 1 is contained in Q 2 ? Which of the queries is faster to compute? 3

Homomorphisms • A symbol mapping is a mapping of variables to other variables or to constants and of constants to constants • Consider the queries Q 1, defined as H 1: -B 1 and Q 2, defined as H 2: -B 2. • A symbol mapping h of the variables and constants in Q 1 to those in Q 2 is a homomorphism if: – h(H 1) = H 2 – h(B 1) is contained in B 2 4

Example • • Consider the queries Q 1: p (X, Y) : - e(X, Z), e(Z, Y) Q 2: p(X, Y) : - e(X, Z), e(Z, Y), e(X, W) A homomorphism from Q 1 to Q 2: – h(X) = X, h(Y) = Y, h(Z) = Z • Can you find a homomorphism from Q 2 to Q 1? – h(X) = , h(Y) = , h(Z) = , h(W) = 5

Containment • Theorem: Q 1 is contained in Q 2 if and only if there is a homomorphism from Q 2 to Q 1 • Proof: • (if) Suppose that there is a homomorphism h from Q 2 to Q 1. – Let D be a database. – Let f be an assignment that satisfies the body of Q 1. 6

If (continued) – Then, Q 1 returns f(H 1). – We show that f h is a satisfying assignment of the body of Q 2 that returns f(H 1). • f h(B 2) = f(h(B 2)) which is contained in f(B 1) which is contained in D f h satisfies B 2 • f h(H 2) = f(h(H 2)) = f(H 1) – Therefore, Q 2 also returns f(H 1), as required 7

Only If • (Only If) Suppose that Q 1 is contained in Q 2. We show that there is a homomorphism from Q 2 to Q 1. – Let f be a symbol mapping of the constants and variables in Q 1 to distinct constants. – Let D be the database defined by f(B 1). – Then if we compute Q 1 on D, f(H 1) will be returned. – Since Q 1 is contained in Q 2, when we compute Q 2 on D, f(H 1) will be returned. 8

Only If (continued) – Let g be the mapping of Q 2 that returns f(H 1). – We show that f -1 g is a homomorphism from Q 2 to Q 1. – Note that f -1 is well defined since f -1 is injective – f -1 g (B 2) = f -1(g(B 2)) is contained in f -1(D) which is equal to B 1 – f -1 g (H 2) = f -1(g(H 2)) = f -1(f(H 1)) = H 1 – Therefore, there is a homomorphism from Q 2 to Q 1 9

An Optimization • We can optimize the query p 2(X, Y) : - e(X, Z), e(Z, Y), e(X, W) by removing the last atom • In General: Given a query Q: For each atom a the body of Q Let Q’ be Q without a If Q’ is equivalent to Q, then remove a from Q • Note that is is sufficient to check if Q’ is contained in Q 10

Containment with FDs • Consider the queries Q 1: p(X, Y) : - e(X, X), e(X, Y) Q 2: p (X, X) : - e(X, X) • Then Q 2 is contained in Q 1. • However Q 1 is not contained in Q 2. • What about if we know that the first column in e functionally determines the second column? 11

The Chase • Given a query and a set of FDs, we apply a chase of the FDs to the query by finding any contradiction to a FD and “fixing it” by equating the tail. • For example, we chase with e: \$1 \$2 p(X, Y) : - e(X, X), e(X, Y) p(X, X) : - e(X, X), e(X, X) 12

Checking Containment • In order to check containment we first apply a chase to both queries and then check for a homomorphism. • Can you prove that this process is correct? 13

Checking for a Lossless Join • Suppose R=(C, D, E), R 1=(C, D), R 2=(D, E) and F={C D, D E} • We want to check if for all instances r of R, r = R 1 (r) R 2 (r) • A row (A 1, A 2, A 3) is in R 1 (r) R 2 (r) if there is some row (A 1, A 2) in R 1 (r) and some row (A 2, A 3) in R 2 (r). • A row (A 1, A 2, A 3) is in R 1 (r) R 2 (r) if there is some row (A 1, A 2, B 1) in r and some row (B 2, A 3) in r. 14

Checking for a Lossless Join • We can express the rules stated above by Q 1: p(A 1, A 2, A 3) : - r(A 1, A 2, B 1), r(B 2, A 3) • Q 1 expresses the set of rows in R 1 (r) R 2 (r) • We can express the set of rows in r by Q 2: p(A 1, A 2, A 3) : - r(A 1, A 2, A 3) • So, r = R 1 (r) R 2 (r) if and only if Q 1 is equivalent to Q 2. • Note that Q 2 is always contained in Q 1. • To check containment of Q 1 in Q 2, apply the chase and then look for a homomorphism from Q 2 to Q 1. • There will be a homomorphism iff one of the atoms in the 15 body of Q 1 contains only a-s.

Checking for a Lossless Join • Suppose R=(C, D, E), R 1=(C, E), R 2=(D, E) and F={C D, D E} • Create the queries: Q 1: p(A 1, A 2, A 3) : - r(A 1, B 1, A 3), r(B 2, A 3) Q 2: p(A 1, A 2, A 3) : - r(A 1, A 2, A 3) • Note that Q 2 is contained in Q 1 • Q 1 is not contained in Q 2, even after applying a chase, since after applying the chase there is no rows with only a-s. 16

General Algorithm • Given a relation R with m attributes and a decomposition R 1, . . . , Rn: – For each Ri, create an atom r with Aj in place j if the j-th attribute of R is in Ri. Put unused variables in the rest of the places. – Create a rule with head p(A 1, . . . , Am) and all the atoms from before in the body – Create a rule p(A 1, . . . , Am): -r(A 1, . . . , Am) – There is a lossless join iff the queries are equivalent 17