Datalog Logical Rules Recursion 1 Logic As a

  • Slides: 41
Download presentation
Datalog Logical Rules Recursion 1

Datalog Logical Rules Recursion 1

Logic As a Query Language u. If-then logical rules have been used in many

Logic As a Query Language u. If-then logical rules have been used in many systems. w Most important today: EII (Enterprise Information Integration). u. Nonrecursive rules are equivalent to the core relational algebra. u. Recursive rules extend relational algebra --- have been used to add recursion to SQL-99. 2

A Logical Rule u. Our first example of a rule uses the relations Frequents(drinker,

A Logical Rule u. Our first example of a rule uses the relations Frequents(drinker, bar), Likes(drinker, beer), and Sells(bar, beer, price). u. The rule is a query asking for “happy” drinkers --- those that frequent a bar that serves a beer that they like. 3

Anatomy of a Rule Happy(d) <- Frequents(d, bar) AND Likes(d, beer) AND Sells(bar, beer,

Anatomy of a Rule Happy(d) <- Frequents(d, bar) AND Likes(d, beer) AND Sells(bar, beer, p) Head = “consequent, ” a single subgoal Body = “antecedent” = AND of subgoals. Read this symbol “if” 4

Subgoals Are Atoms u. An atom is a predicate, or relation name with variables

Subgoals Are Atoms u. An atom is a predicate, or relation name with variables or constants as arguments. u. The head is an atom; the body is the AND of one or more atoms. u. Convention: Predicates begin with a capital, variables begin with lower-case. 5

Example: Atom Sells(bar, beer, p) The predicate = name of a relation Arguments are

Example: Atom Sells(bar, beer, p) The predicate = name of a relation Arguments are variables 6

Interpreting Rules u. A variable appearing in the head is called distinguished ; otherwise

Interpreting Rules u. A variable appearing in the head is called distinguished ; otherwise it is nondistinguished. u. Rule meaning: The head is true of the distinguished variables if there exist values of the nondistinguished variables that make all subgoals of the body true. 7

Example: Interpretation Happy(d) <- Frequents(d, bar) AND Likes(d, beer) AND Sells(bar, beer, p) Distinguished

Example: Interpretation Happy(d) <- Frequents(d, bar) AND Likes(d, beer) AND Sells(bar, beer, p) Distinguished variable Nondistinguished variables Interpretation: drinker d is happy if there exist a bar, a beer, and a price p such that d frequents the bar, likes the beer, and the bar sells the beer at price p. 8

Arithmetic Subgoals u. In addition to relations as predicates, a predicate for a subgoal

Arithmetic Subgoals u. In addition to relations as predicates, a predicate for a subgoal of the body can be an arithmetic comparison. w We write such subgoals in the usual way, e. g. : x < y. 9

Example: Arithmetic u. A beer is “cheap” if there at least two bars that

Example: Arithmetic u. A beer is “cheap” if there at least two bars that sell it for under $2. Cheap(beer) <- Sells(bar 1, beer, p 1) AND Sells(bar 2, beer, p 2) AND p 1 < 2. 00 AND p 2 < 2. 00 AND bar 1 <> bar 2 10

Negated Subgoals u. We may put NOT in front of a subgoal, to negate

Negated Subgoals u. We may put NOT in front of a subgoal, to negate its meaning. u. Example: Think of Arc(a, b) as arcs in a graph. w S(x, y) says the graph is not transitive from x to y ; i. e. , there is a path of length 2 from x to y, but no arc from x to y. S(x, y) <- Arc(x, z) AND Arc(z, y) AND NOT Arc(x, y) 11

Safe Rules u A rule is safe if: 1. Each distinguished variable, 2. Each

Safe Rules u A rule is safe if: 1. Each distinguished variable, 2. Each variable in an arithmetic subgoal, 3. Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. u We allow only safe rules. 12

Example: Unsafe Rules u Each of the following is unsafe and not allowed: 1.

Example: Unsafe Rules u Each of the following is unsafe and not allowed: 1. S(x) <- R(y) 2. S(x) <- R(y) AND NOT R(x) 3. S(x) <- R(y) AND x < y u In each case, an infinity of x ’s can satisfy the rule, even if R is a finite relation. 13

Datalog Programs u A Datalog program is a collection of rules. u In a

Datalog Programs u A Datalog program is a collection of rules. u In a program, predicates can be either 1. EDB = Extensional Database = stored table. 2. IDB = Intensional Database = relation defined by rules. w Never both! No EDB in heads. 14

Evaluating Datalog Programs u. As long as there is no recursion, we can pick

Evaluating Datalog Programs u. As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. u. If an IDB predicate has more than one rule, each rule contributes tuples to its relation. 15

Example: Datalog Program u. Using EDB Sells(bar, beer, price) and Beers(name, manf), find the

Example: Datalog Program u. Using EDB Sells(bar, beer, price) and Beers(name, manf), find the manufacturers of beers Joe doesn’t sell. Joe. Sells(b) <- Sells(’Joe’’s Bar’, b, p) Answer(m) <- Beers(b, m) AND NOT Joe. Sells(b) 16

Expressive Power of Datalog u. Without recursion, Datalog can express all and only the

Expressive Power of Datalog u. Without recursion, Datalog can express all and only the queries of core relational algebra. w The same as SQL select-from-where, without aggregation and grouping. u. But with recurson, Datalog can express more than these languages. u. Yet still not Turing-complete. 17

Recursive Example u. EDB: Par(c, p) = p is a parent of c. u.

Recursive Example u. EDB: Par(c, p) = p is a parent of c. u. Generalized cousins: people with common ancestors one or more generations back: Sib(x, y) <- Par(x, p) AND Par(y, p) AND x<>y Cousin(x, y) <- Sib(x, y) Cousin(x, y) <- Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp) 18

Definition of Recursion u. Form a dependency graph whose nodes = IDB predicates. u.

Definition of Recursion u. Form a dependency graph whose nodes = IDB predicates. u. Arc X ->Y if and only if there is a rule with X in the head and Y in the body. u. Cycle = recursion; no cycle = no recursion. 19

Example: Dependency Graphs Cousin Answer Sib Joe. Sells Recursive Nonrecursive 20

Example: Dependency Graphs Cousin Answer Sib Joe. Sells Recursive Nonrecursive 20

Evaluating Recursive Rules u The following works when there is no negation: 1. Start

Evaluating Recursive Rules u The following works when there is no negation: 1. Start by assuming all IDB relations are empty. 2. Repeatedly evaluate the rules using the EDB and the previous IDB, to get a new IDB. 3. End when no change to IDB. 21

The “Naïve” Evaluation Algorithm Start: IDB = 0 Apply rules to IDB, EDB yes

The “Naïve” Evaluation Algorithm Start: IDB = 0 Apply rules to IDB, EDB yes Change to IDB? no done 22

Example: Evaluation of Cousin u. We’ll proceed in rounds to infer Sib facts (red)

Example: Evaluation of Cousin u. We’ll proceed in rounds to infer Sib facts (red) and Cousin facts (green). u. Remember the rules: Sib(x, y) <- Par(x, p) AND Par(y, p) AND x<>y Cousin(x, y) <- Sib(x, y) Cousin(x, y) <- Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp) 23

Seminaive Evaluation u. Since the EDB never changes, on each round we only get

Seminaive Evaluation u. Since the EDB never changes, on each round we only get new IDB tuples if we use at least one IDB tuple that was obtained on the previous round. u. Saves work; lets us avoid rediscovering most known facts. w A fact could still be derived in a second way. 24

Par Data: Parent Above Child a Round 1 Round 2 Round 3 Round 4

Par Data: Parent Above Child a Round 1 Round 2 Round 3 Round 4 j d b c f g k e h i 25

Recursion Plus Negation u“Naïve” evaluation doesn’t work when there are negated subgoals. u. In

Recursion Plus Negation u“Naïve” evaluation doesn’t work when there are negated subgoals. u. In fact, negation wrapped in a recursion makes no sense in general. u. Even when recursion and negation are separate, we can have ambiguity about the correct IDB relations. 26

Stratified Negation u. Stratification is a constraint usually placed on Datalog with recursion and

Stratified Negation u. Stratification is a constraint usually placed on Datalog with recursion and negation. u. It rules out negation wrapped inside recursion. u. Gives the sensible IDB relations when negation and recursion are separate. 27

Problematic Recursive Negation P(x) <- Q(x) AND NOT P(x) Q(1), Q(2) Initial: P= Round

Problematic Recursive Negation P(x) <- Q(x) AND NOT P(x) Q(1), Q(2) Initial: P= Round 1: P = Round 2: P = Round 3: P = EDB: {} {(1), (2)}, etc. … 28

Strata u. Intuitively, the stratum of an IDB predicate P is the maximum number

Strata u. Intuitively, the stratum of an IDB predicate P is the maximum number of negations that can be applied to an IDB predicate used in evaluating P. u. Stratified negation = “finite strata. ” u. Notice in P(x) <- Q(x) AND NOT P(x), we can negate P an infinite number of times deriving P(x). 29

Stratum Graph u. To formalize strata use the stratum graph : w Nodes =

Stratum Graph u. To formalize strata use the stratum graph : w Nodes = IDB predicates. w Arc A ->B if predicate A depends on B. w Label this arc “–” if the B subgoal is negated. 30

Stratified Negation Definition u. The stratum of a node (predicate) is the maximum number

Stratified Negation Definition u. The stratum of a node (predicate) is the maximum number of – arcs on a path leading from that node. u. A Datalog program is stratified if all its IDB predicates have finite strata. 31

Example P(x) <- Q(x) AND NOT P(x) -- P 32

Example P(x) <- Q(x) AND NOT P(x) -- P 32

Another Example u. EDB = Source(x), Target(x), Arc(x, y). u. Rules for “targets not

Another Example u. EDB = Source(x), Target(x), Arc(x, y). u. Rules for “targets not reached from any source”: Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x) No. Reach(x) <- Target(x) AND NOT Reach(x) 33

The Stratum Graph Stratum 1: <= 1 arc on any path out. Stratum 0:

The Stratum Graph Stratum 1: <= 1 arc on any path out. Stratum 0: No – arcs on any path out. No. Reach -Reach 34

Models u. A model is a choice of IDB relations that, with the given

Models u. A model is a choice of IDB relations that, with the given EDB relations makes all rules true regardless of what values are substituted for the variables. w Remember: a rule is true whenever its body is false. w But if the body is true, then the head must be true as well. 35

Minimal Models u. When there is no negation, a Datalog program has a unique

Minimal Models u. When there is no negation, a Datalog program has a unique minimal model (one that does not contain any other model). u. But with negation, there can be several minimal models. u. The stratified model is the one that “makes sense. ” 36

The Stratified Model u. When the Datalog program is stratified, we can evaluate IDB

The Stratified Model u. When the Datalog program is stratified, we can evaluate IDB predicates loweststratum-first. u. Once evaluated, treat it as EDB for higher strata. 37

Example: Multiple Models --- (1) Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x)

Example: Multiple Models --- (1) Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x) No. Reach(x) <- Target(x) AND NOT Reach(x) 1 Source Arc 2 3 Target Stratum 0: Reach(1), Reach(2) Arc 4 Stratum 1: No. Reach(3) 38

Example: Multiple Models --- (2) Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x)

Example: Multiple Models --- (2) Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y, x) No. Reach(x) <- Target(x) AND NOT Reach(x) 1 Source Arc 2 3 Target Arc 4 Another model! Reach(1), Reach(2), Reach(3), Reach(4); No. Reach is empty. 39

Assumption u. When the logic is stratified, the stratified model is the one that

Assumption u. When the logic is stratified, the stratified model is the one that “makes sense. ” u. This principle is used in SQL-99 recursion --- the stratified model is defined to be the correct query result. 40

Th-th-th- That’s All Folks u. See you at the review Friday 5: 15 PM.

Th-th-th- That’s All Folks u. See you at the review Friday 5: 15 PM. u. And at the final, Monday, 3: 30 PM. 41