Logical Query Languages Motivation Logical rules extend more

  • Slides: 29
Download presentation
Logical Query Languages Motivation: • Logical rules extend more naturally to recursive queries than

Logical Query Languages Motivation: • Logical rules extend more naturally to recursive queries than does relational algebra. u • Used in SQL recursion. Logical rules form the basis for many information-integration systems and applications. 1

Datalog Example Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) Happy(d) <Frequents(d, bar) AND Likes(d,

Datalog Example Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) Happy(d) <Frequents(d, bar) AND Likes(d, beer) AND Sells(bar, beer, p) • • Above is a rule. Left side = head. Right side = body = AND of subgoals. Head and subgoals are atoms. Atom = predicate and arguments. u Predicate = relation name or arithmetic predicate, e. g. <. u Arguments are variables or constants. u • Subgoals (not head) may optionally be negated by NOT. 2

Meaning of Rules Head is true of its arguments if there exist values for

Meaning of Rules Head is true of its arguments if there exist values for local variables (those in body, not in head) that make all of the subgoals true. • If no negation or arithmetic comparisons, just natural join the subgoals and project onto the head variables. Example Above rule equivalent to Happy(d) = πdrinker(Frequents Likes Sells) 3

Evaluation of Rules Two, dual, approaches: 1. Variable-based: Consider all possible assignments of values

Evaluation of Rules Two, dual, approaches: 1. Variable-based: Consider all possible assignments of values to variables. If all subgoals are true, add the head to the result relation. 2. Tuple-based: Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result. Example: Variable-Based Assignment S(x, y) <- R(x, z) AND R(z, y) AND NOT R(x, y) R= A 1 2 B 2 3 4

 • Only assignments that make first subgoal true: 1. x 1, z 2.

• Only assignments that make first subgoal true: 1. x 1, z 2. 2. x 2, z 3. 3. In case (1), y 3 makes second subgoal true. Since (1, 3) is not in R, the third subgoal is also true. u • Thus, add (x, y) = (1, 3) to relation S. In case (2), no value of y makes the second subgoal true. Thus, S = A 1 B 3 5

Example: Tuple-Based Assignment Trick: start with the positive (not negated), relational (not arithmetic) subgoals

Example: Tuple-Based Assignment Trick: start with the positive (not negated), relational (not arithmetic) subgoals only. S(x, y) <- R(x, z) AND R(z, y) AND NOT R(x, y) R= A 1 2 B 2 3 • Four assignments of tuples to subgoals: R(x, z) (1, 2) (2, 3) R(z, y) (1, 2) (2, 3) • Only the second gives a consistent value to z. • That assignment also makes NOT R(x, y) true. • Thus, (1, 3) is the only tuple for the head. 6

Safety A rule can make no sense if variables appear in funny ways. Examples

Safety A rule can make no sense if variables appear in funny ways. Examples • S(x) <- R(y) • S(x) <- NOT R(x) • S(x) <- R(y) AND x < y In each of these cases, the result is infinite, even if the relation R is finite. • To make sense as a database operation, we need to require three things of a variable x (= definition of safety). If x appears in either 1. 2. 3. • The head, A negated subgoal, or An arithmetic comparison, then x must also appear in a nonnegated, “ordinary” (relational) subgoal of the body. We insist that rules be safe, henceforth. 7

Datalog Programs • A collection of rules is a Datalog program. • Predicates/relations divide

Datalog Programs • A collection of rules is a Datalog program. • Predicates/relations divide into two classes: u. EDB = extensional database = relation stored in DB. u. IDB = intensional database = relation defined by one or more rules. • A predicate must be IDB or EDB, not both. u. Thus, an IDB predicate can appear in the body or head of a rule; EDB only in the body. 8

Example Convert the following SQL (Find the manufacturers of the beers Joe sells): Beers(name,

Example Convert the following SQL (Find the manufacturers of the beers Joe sells): Beers(name, manf) Sells(bar, beer, price) SELECT manf FROM Beers WHERE name IN( SELECT beer FROM Sells WHERE bar = 'Joe''s Bar' ); to a Datalog program. Joe. Sells(b) <Sells('Joe''s Bar', b, p) Answer(m) <Joe. Sells(b) AND Beers(b, m) • Note: Beers, Sells = EDB; Joe. Sells, Answer = IDB. 9

Expressive Power of Datalog • Nonrecursive Datalog = (classical) relational algebra. u. See discussion

Expressive Power of Datalog • Nonrecursive Datalog = (classical) relational algebra. u. See discussion in text. • Datalog simulates SQL select-from-where without aggregation and grouping. • Recursive Datalog expresses queries that cannot be expressed in SQL. • But none of these languages have full expressive power (Turing completeness). 10

Recursion • IDB predicate P depends on predicate Q if there is a rule

Recursion • IDB predicate P depends on predicate Q if there is a rule with P in the head and Q in a subgoal. • Draw a graph: nodes = IDB predicates, arc P Q means P depends on Q. • Cycles if and only if recursive. Recursive Example Sib(x, y) <- Par(x, p) AND Par(y, p) AND x <> y Cousin(x, y) <- Sib(x, y) Cousin(x, y) <- Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp) 11

Iterative Fixed-Point Evaluates Recursive Rules Start IDB = ø Apply rules to IDB, EDB

Iterative Fixed-Point Evaluates Recursive Rules Start IDB = ø Apply rules to IDB, EDB yes Change to IDB? done no 12

Example a EDB Par = d b c f j g k e h

Example a EDB Par = d b c f j g k e h i • Note, because of symmetry, Sib and Cousin facts appear in pairs, so we shall mention only (x, y) when both (x, y) and (y, x) are meant. 13

Sib Initial Round 1 (b, c), (c, e) add: (g, h), (j, k) Round

Sib Initial Round 1 (b, c), (c, e) add: (g, h), (j, k) Round 2 add: Round 3 add: Round 4 add: Cousin (b, c), (c, e) (g, h), (j, k) (f, g), (f, h) (g, i), (h, i) (i, k) (k, k) (i, j) 14

Stratified Negation • • • Negation wrapped inside a recursion makes no sense. Even

Stratified Negation • • • Negation wrapped inside a recursion makes no sense. Even when negation and recursion are separated, there can be ambiguity about what the rules mean, and some one meaning must be selected. Stratified negation is an additional restraint on recursive rules (like safety) that solves both problems: It rules out negation wrapped in recursion. 2. When negation is separate from recursion, it yields the intuitively correct meaning of rules (the stratified model). 1. 15

Problem with Recursive Negation Consider: P(x) <- Q(x) AND NOT P(x) • Q =

Problem with Recursive Negation Consider: P(x) <- Q(x) AND NOT P(x) • Q = EDB = {1, 2}. • Compute IDB P iteratively? P = . u. Round 1: P = {1, 2}. u. Round 2: P = , etc. u. Initially, 16

Strata Intuitively: stratum of an IDB predicate = maximum number of negations you can

Strata Intuitively: stratum of an IDB predicate = maximum number of negations you can pass through on the way to an EDB predicate. • Must not be in “stratified” rules. • Define stratum graph: u Nodes = IDB predicates. u Arc P Q if Q appears in the body of a rule with u Label that arc “–” if Q is in a negated subgoal. head P. Example P(x) <- Q(x) AND NOT P(x) – P 17

Example Which target nodes cannot be reached from any source node? Reach(x) <- Source(x)

Example Which target nodes cannot be reached from any source node? Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x) No. Reach(x) <- Target(x) AND NOT Reach(x) No. Reach – Reach 18

Computing Strata Stratum of an IDB predicate A = maximum number of “–” arcs

Computing Strata Stratum of an IDB predicate A = maximum number of “–” arcs on any path from A in the stratum graph. Examples • For first example, stratum of P is . • For second example, stratum of Reach is 0; stratum of No. Reach is 1. Stratified Negation A Datalog program is stratified if every IDB predicate has a finite stratum. Stratified Model If a Datalog program is stratified, we can compute the relations for the IDB predicates lowest-stratum-first. 19

Example Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x) No. Reach(x) <- Target(x)

Example Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x) No. Reach(x) <- Target(x) AND NOT Reach(x) • EDB: u Source = {1}. u Arc = {(1, 2), (3, 4), u Target = {2, 3}. (4, 3)}. 1 2 3 source target 4 • First compute Reach = {1, 2} (stratum 0). • Next compute No. Reach = {3}. 20

Is the Stratified Solution “Obvious”? Not really. • There is another model that makes

Is the Stratified Solution “Obvious”? Not really. • There is another model that makes the rules true no matter what values we substitute for the variables. u Reach = {1, 2, 3, 4}. u No. Reach = . • Remember: the only way to make a Datalog rule false is to find values for the variables that make the body true and the head false. this model, the heads of the rules for Reach are true for all values, and in the rule for No. Reach the subgoal NOT Reach(x) assures that the body cannot be true. u For 21

SQL Recursion WITH stuff that looks like Datalog rules an SQL query about EDB,

SQL Recursion WITH stuff that looks like Datalog rules an SQL query about EDB, IDB • Rule = [RECURSIVE] R(<arguments>) AS SQL query 22

Example • Find Sally’s cousins, using EDB Par(child, parent). WITH Sib(x, y) AS SELECT

Example • Find Sally’s cousins, using EDB Par(child, parent). WITH Sib(x, y) AS SELECT p 1. child, p 2, child FROM Par p 1, Par p 2 WHERE p 1. parent = p 2. parent AND p 1. child <> p 2. child, RECURSIVE Cousin(x, y) AS Sib UNION (SELECT p 1. child, p 2. child FROM Par p 1, Par p 2, Cousin WHERE p 1. parent = Cousin. x AND p 2. parent = Cousin. y ) SELECT y FROM Cousin WHERE x = 'Sally'; 23

Plan for Describing Legal SQL Recursion • • Define “monotonicity, ” a property that

Plan for Describing Legal SQL Recursion • • Define “monotonicity, ” a property that generalizes “stratification. ” Generalize stratum graph to apply to SQL queries instead of Datalog rules. u • (Non)monotonicity replaces NOT in subgoals. Define semantically correct SQL recursions in terms of stratum graph. Monotonicity If relation P is a function of relation Q (and perhaps other things), we say P is monotone in Q if adding tuples to Q cannot cause any tuple of P to be deleted. 24

Monotonicity Example In addition to certain negations, an aggregation cause nonmonotonicity. Sells(bar, beer, price)

Monotonicity Example In addition to certain negations, an aggregation cause nonmonotonicity. Sells(bar, beer, price) SELECT AVG(price) FROM Sells WHERE bar = 'Joe''s Bar'; • Adding to Sells a tuple that gives a new beer Joe sells will usually change the average price of beer at Joe’s. • Thus, the former result, which might be a single tuple like (2. 78) becomes another single tuple like (2. 81), and the old tuple is lost. 25

Generalizing Stratum Graph to SQL • • • Node for each relation defined by

Generalizing Stratum Graph to SQL • • • Node for each relation defined by a “rule. ” Node for each subquery in the “body” of a rule. Arc P Q if u u u • • P is “head” of a rule, and Q is a relation appearing in the FROM list of the rule (not in the FROM list of a subquery), as argument of a UNION, etc. P is head of a rule, and Q is a subquery directly used in that rule (not nested within some larger subquery). P is a subquery, and Q is a relation or subquery used directly within P [analogous to (a) and (b) for rule heads]. Label the arc – if P is not monotone in Q. Requirement for legal SQL recursion: finite strata only. 26

Example For the Sib/Cousin example, there are three nodes: Sib, Cousin, and SQ (the

Example For the Sib/Cousin example, there are three nodes: Sib, Cousin, and SQ (the second term of the union in the rule for Cousin). Sib Cousin SQ • No nonmonotonicity, hence legal. 27

A Nonmonotonic Example Change the UNION to EXCEPT in the rule for Cousin. RECURSIVE

A Nonmonotonic Example Change the UNION to EXCEPT in the rule for Cousin. RECURSIVE Cousin(x, y) AS Sib EXCEPT (SELECT p 1. child, p 2. child FROM Par p 1, Par p 2, Cousin WHERE p 1. parent = Cousin. x AND p 2. parent = Cousin. y ) Sib • Now, adding to the result of the subquery can delete Cousin facts; i. e. , Cousin is nonmonotone in SQ. • Infinite number of –’s in cycle, so illegal in SQL. Cousin SQ 28

Another Example: NOT Doesn’t Mean Nonmonotone Leave Cousin as it was, but negate one

Another Example: NOT Doesn’t Mean Nonmonotone Leave Cousin as it was, but negate one of the conditions in the where-clause. RECURSIVE Cousin(x, y) AS Sib UNION (SELECT p 1. child, p 2. child FROM Par p 1, Par p 2, Cousin WHERE p 1. parent = Cousin. x AND NOT (p 2. parent = Cousin. y) ) • You might think that SQ depends negatively on Cousin, but it doesn’t. If I add a new tuple to Cousin, all the old tuples still exist and yield whatever tuples in SQ they used to yield. u In addition, the new Cousin tuple might combine with old p 1 and p 2 tuples to yield something new. u 29