Introduction to Relational Calculus Predicate Calculus over Relations

  • Slides: 42
Download presentation
Introduction to Relational Calculus Predicate Calculus over Relations and the Safety of Relational Calculus

Introduction to Relational Calculus Predicate Calculus over Relations and the Safety of Relational Calculus Expressions 6/9/2021 1 CS 319 Theory of Databases

Relational Calculus 1 Predicate Calculus query languages. . . a query = finding values

Relational Calculus 1 Predicate Calculus query languages. . . a query = finding values satisfying predicate Two kinds of predicate calculus language … terms (primitive objects of discourse) "tuples” tuple relational calculus "domain values” domain relational calculus 6/9/2021 2 CS 319 Theory of Databases

Relational Calculus 2 Expressions in the tuple relational calculus Basic form of such an

Relational Calculus 2 Expressions in the tuple relational calculus Basic form of such an expression: {t | y(t)} where t is a tuple variable Here t [or t(r)] denotes a tuple of some fixed arity (NB not denoting a tuple of fixed type) and y is a formula built according to the conventional first order predicate calculus (“FOPC”) rules 6/9/2021 3 CS 319 Theory of Databases

Recall the logical expression for a QUEL query Archetypal query in QUEL constructs new

Recall the logical expression for a QUEL query Archetypal query in QUEL constructs new relation from relations R 1, R 2, . . . , Rk. range of t 1 is R 1 range of t 2 is R 2. . range of tk is Rk retrieve (ti(1). Aj(1), . . , ti(r). Aj(r)) where Y(t 1, t 2, . . . , tk) is a (quantifier-free) logical constraint on the tuples selected by the range variables t 1, t 2, . . . , tk in the construction process 6/9/2021 4 CS 319 Theory of Databases

Recall the logical expression for a QUEL query A suitable logical expression for the

Recall the logical expression for a QUEL query A suitable logical expression for the required relation is { u(r) | ( t 1). . . ( tk) ( R 1(t 1) R 2(t 2) . . . Rk(tk) u[1] = ti(1) [ j(1) ] u[2] = ti(2) [ j(2) ] . . . u[r] = ti(r) [ j(r) ], Y(t 1, t 2, . . . , tk) ) } 6/9/2021 5 CS 319 Theory of Databases

Relational Calculus 3 Form of relational calculus expressions Formula is built up using operators

Relational Calculus 3 Form of relational calculus expressions Formula is built up using operators of FOPC from atomic clauses of three types: 1. R(s), where R is relation name, s is tuple variable 2. s[i] q u[j], where s and u are tuple variables, and q is an arithmetic comparison operator (such as <, = etc) 3. s[i] q c, where s is tuple variable, c is a constant 6/9/2021 6 CS 319 Theory of Databases

Relational Calculus 4 Semantics of the atomic clauses: 1. R(s), where R is relation

Relational Calculus 4 Semantics of the atomic clauses: 1. R(s), where R is relation name, s is tuple variable 2. s[i] q u[j], where s and u are tuple variables, and q is an arithmetic comparison operator 3. s[i] q c, where s is tuple variable, c is a constant 1. "s represents a tuple in R" 2. "ith component of tuple represented by s is in relation q to the jth component of tuple represented by u” 3. "ith component of tuple represented by s is in relation q to the constant c” 6/9/2021 7 CS 319 Theory of Databases

General Relational Calculus Formulae 1 Combine clauses in standard predicate calculus fashion A well-formed

General Relational Calculus Formulae 1 Combine clauses in standard predicate calculus fashion A well-formed formula (wff) is defined recursively: 1. Every atomic clause is a wff. 2. If y and y' are wffs, then y y', y y' and ¬y are wffs 3. If y is a wff, ( s)(y) is a wff 4. If y is a wff, ( s)(y) is a wff 5. and are disambiguated by brackets as indicated in 3. and 4. (NB need extra syntactic conventions to declare precedence in arithmetic expressions etc) 6. The class of wffs is the smallest class closed wrt rules 1, 2, 3, 4, 5, and containing all atomic clauses. 6/9/2021 8 CS 319 Theory of Databases

General Relational Calculus Formulae 2 Basic vocabulary 1. and 2. define propositional logic –

General Relational Calculus Formulae 2 Basic vocabulary 1. and 2. define propositional logic – need more 3. and 4. define predicate logic ( s)(y) is existential quantification "there exists s, such that. . . " ( s)(y) is universal quantification "for all s, . . . " and are quantifiers. To interpret wffs need to define free and bound variables. . . informally, quantifiers introduce bound variables – cf. global / local variables in a PL 6/9/2021 9 CS 319 Theory of Databases

General Relational Calculus Formulae 3 Free and bound variables in wff's 1. Every atomic

General Relational Calculus Formulae 3 Free and bound variables in wff's 1. Every atomic clause is a wff. All tuple variables mentioned in an atomic wff are said to be free in this wff 2. If y and y' are wffs, then y y', y y' and ¬y are wffs A variable in a wff is free iff it is not within the scope of a quantification, so x appears free/bound in y y' iff it appears free/bound in y or y' (perhaps both) etc. Confusing when x denotes both bound and free vars. . . 6/9/2021 10 CS 319 Theory of Databases

General Relational Calculus Formulae 4 Renaming bound vars doesn’t affect meaning of wff Can

General Relational Calculus Formulae 4 Renaming bound vars doesn’t affect meaning of wff Can rename bound vars in both y and y' to ensure that • x occurs in both y and y' iff x occurs freely in both y and y' When such renaming has been done, have • x occurs free in y y' ( or y y' ) iff x occurs free in y or y' (perhaps both) • x occurs bound in y y' ( or y y' ) iff x occurs bound in y or y' (but not both) 6/9/2021 11 CS 319 Theory of Databases

General Relational Calculus Formulae 4 Free & bound variables in wffs (cont) 3. If

General Relational Calculus Formulae 4 Free & bound variables in wffs (cont) 3. If y is a wff, ($s)(y) is a wff Occurrences of s which are bound in y can be eliminated by renaming as necessary. When this has been done, occurrences of s which are free in y are bound in ( s)(y). 4. If y is a wff, ("s)(y) is a wff Similar conventions operate for bound and free variables to those in case of 3. above. . . use concepts of free and bound variables to give formal interpretation to wffs. . . 6/9/2021 12 CS 319 Theory of Databases

General Relational Calculus Formulae 5 Semantics of wff's A tuple relational calculus expression y

General Relational Calculus Formulae 5 Semantics of wff's A tuple relational calculus expression y with free tuple variables t 1, t 2, . . . , tn is satisfied by assigning specific tuples to t 1, t 2, . . . , tn so that formula y evaluates to true. Denote such a y by y(t 1, t 2, . . . , tn) when we want to emphasise the free variables in it. NB in the process of defining a set of tuples: {t | y(t)} where t is a single tuple variable may need to define "tuples of tuples" by logical formulae with more than one free tuple variable in them. 6/9/2021 13 CS 319 Theory of Databases

General Relational Calculus Formulae 6 Semantics of wff's (cont. ) 1. Every atomic clause

General Relational Calculus Formulae 6 Semantics of wff's (cont. ) 1. Every atomic clause is a wff. Semantics explained above. For example: { (s, u) | s[1] = u[2] } defines all pairs of tuples (s 0, u 0) such that first component of s 0 = second component of u 0 2. If y and y' are wffs, then y y', y y' and ¬y are wffs Semantics: (y y')(t) means y(t) and y'(t) etc 6/9/2021 14 CS 319 Theory of Databases

General Relational Calculus Formulae 7 Semantics of quantified expressions 3. If y is a

General Relational Calculus Formulae 7 Semantics of quantified expressions 3. If y is a wff, ($s)(y) is a wff (( s)(y))(t 1, t 2, . . . , tn) ( s)(y(t 1, t 2, . . . , tn)) Take n=1: obvious extension of semantics to cases where there are more free variables. Semantics: The predicate ( s)(y(t)) is true for the tuple t=t 0 if, on substituting t 0 for each free occurrence of t in y, there exists a tuple s 0 of the appropriate arity such that substituting s 0 for all free occurrences of s in y ensures that the formula y evaluates to true. 6/9/2021 15 CS 319 Theory of Databases

General Relational Calculus Formulae 8 Semantics of quantified expressions (cont. ) 4. If y

General Relational Calculus Formulae 8 Semantics of quantified expressions (cont. ) 4. If y is a wff, ("s)(y) is a wff (( s)(y))(t 1, t 2, . . . , tn) ( s)(y(t 1, t 2, . . . , tn)) Take n=1: obvious extension of semantics to cases where there are more free variables. Semantics: The predicate ( s)(y(t)) is true for the tuple t=t 0 if, on substituting t 0 for each free occurrence of t in y, the substitution of all tuples s 0 of the appropriate arity for all free occurrences of s in y ensures that the formula y evaluates to true. 6/9/2021 16 CS 319 Theory of Databases

General Relational Calculus Formulae 9 An illustrative example Express composition of binary relations R

General Relational Calculus Formulae 9 An illustrative example Express composition of binary relations R & S in ISBL: RCS = (R * S) : B=C % A, D or – using the syntax of relational algebra: RCS = 1, 4 s 2=3 (R S) Express RCS in tuple relational calculus as: {w | ( u)( v)(R(u) S(v) F(u, v))} where F(u, v) (u[2] = v [1] w[1] = u[1] w[2] = v[2]) Note: evaluating the predicate requires a finite search – need only inspect tuples from the finite relations R & S 6/9/2021 17 CS 319 Theory of Databases

Safety of relational expressions 1 Without any restriction on a logical expression can define

Safety of relational expressions 1 Without any restriction on a logical expression can define an infinite collection of tuples. Need to restrict to sets of tuples that are finite to take account of storage and computation. For example: what is { t | y(t) } ? . . . very ill-defined collection of tuples how do we compute { t | ($ s)(y(s, t)) } ? . . . when have we considered every possible s? 6/9/2021 18 CS 319 Theory of Databases

Safety of relational expressions 2 Essential to know when it is safe to evaluate

Safety of relational expressions 2 Essential to know when it is safe to evaluate expression … can’t have non-terminating behaviour in a database Solution : need to set limits on the values for tuples under consideration to eliminate endless searches. . . motivates safety rules for expressions 6/9/2021 19 CS 319 Theory of Databases

Safety of relational expressions 3 Safe relational calculus expressions When can we evaluate {

Safety of relational expressions 3 Safe relational calculus expressions When can we evaluate { t | y(t) } ? • In computational terms, want to be able to evaluate truth or falsehood of expression after making a finite set of substitutions • Logical expression means context-independent interpretation For context-independence: basis for restricting a search is what can be inferred about domain of values of interest from the expression to be evaluated. 6/9/2021 20 CS 319 Theory of Databases

Safety of relational expressions 4 For context-independence: basis for restricting a search is what

Safety of relational expressions 4 For context-independence: basis for restricting a search is what can be inferred about domain of values of interest from the expression to be evaluated. Motivates definition of Dom(y), viz: the set of components of tuples in relations mentioned in y together with all constants referenced by y Note that Dom(y) is always a finite set 6/9/2021 21 CS 319 Theory of Databases

Safety of relational expressions 5 An illustrative example: RCS = { w | (

Safety of relational expressions 5 An illustrative example: RCS = { w | ( u)( v)(y(u, v)) } where y(u, v) R(u) S(v) F(u, v) and F(u, v) (u[2]=v[1] . . . ) The relations mentioned in the formula RCS are R and S, and there are no constants. Hence Dom(y) = set of components of tuples in R and S = a subset of the elements of X where R and S are binary relations on X 6/9/2021 22 CS 319 Theory of Databases

Safety of relational expressions 6 An illustrative example (cont. ) RCS = { w

Safety of relational expressions 6 An illustrative example (cont. ) RCS = { w | ( u)( v)(y(u, v)) } where y(u, v) R(u) S(v) F(u, v) and F(u, v) (u[2]=v[1] . . . ) Dom(y) = set of components of tuples in R and S = a subset of the elements of X where R and S are binary relations on X To evaluate RCS, enough to consider possible values of u and v from finite set X X = Dom(y). Thus { w | y(w) } is a safe relational calculus expression 6/9/2021 23 CS 319 Theory of Databases

Safety of relational expressions 7 Relational calculus expression {t | y(t)} is safe if

Safety of relational expressions 7 Relational calculus expression {t | y(t)} is safe if 1. t satisfies y(t) => each component of t is in Dom(y) 2. each well-defined subformula of y of the form ( u)(y(u)) satisfies the condition: if u is a tuple with a component not in Dom(y), then y(u, t 1, t 2, . . . , tn) is false under every assignment of values to the free variables t 1, t 2, . . . , tn in y 3. each well-defined subformula of y of the form ( u)(y(u)) satisfies the (dual) condition: if u is a tuple with a component not in Dom(y), then y(u, t 1, t 2, . . . , tn) is true under every assignment of values to the free variables t 1, t 2, . . . , tn in y 6/9/2021 24 CS 319 Theory of Databases

Safety of relational expressions 8 More about the definition of safety Safety is a

Safety of relational expressions 8 More about the definition of safety Safety is a condition that is strong enough to ensure that we can decide in finitely many steps whether a quantified subformula of the predicate y is true. We wish to restrict our search procedure to the range over tuples of arity k from the finite set Dom(y) k How can we give a valid yes / no answer to the question Does there exist a value to satisfy a logical condition? when we only inspect a finite set of values? 6/9/2021 25 CS 319 Theory of Databases

Safety of relational expressions 9 Let F denote the finite set Dom(y)k to be

Safety of relational expressions 9 Let F denote the finite set Dom(y)k to be inspected. Let U denote the ‘universe’ of all possible k-tuples. Want to ensure that ( u F)(y(u)) ( u U)(y(u)) This will be true provided y(u) is false for all u in UF. Want to ensure that ( u F)(y(u)) ( u U)(y(u)) This will be true provided y(u) is true for all u in UF. NB These conditions are actually stronger than we need, but are the sort of conditions that are most useful in practice, when we need to interpret our predicates in real-world terms 6/9/2021 26 CS 319 Theory of Databases

From Relation Algebra to Calculus 1 Overall aim of the next section: prove that

From Relation Algebra to Calculus 1 Overall aim of the next section: prove that algebraic and logical characterisations of relational queries have equivalent expressive power First show that safe relational expressions can mimic the expressive power of relational algebra … Theorem 1: If E is a relational algebra expression, then there is a safe expression in tuple relational calculus equivalent to E. 6/9/2021 27 CS 319 Theory of Databases

From Relation Algebra to Calculus 2 Proof of Theorem 1 Use induction on the

From Relation Algebra to Calculus 2 Proof of Theorem 1 Use induction on the number N of operators in E. Base of induction (N=0) • E is a constant relation { t 1, t 2, . . . , tn } E = { t | y(t) } where y(t) (t=t 1 t=t 2 . . . t=tn) • E is specified by a relational variable R E = { t | y(t) } where y(t) R(t) Both of these are safe expressions: • values of t that satisfy y(t) are within dom(y)* = set of tuples with compts in dom(y) 6/9/2021 28 CS 319 Theory of Databases

From Relation Algebra to Calculus 3 Induction Step Assume that number of operators in

From Relation Algebra to Calculus 3 Induction Step Assume that number of operators in E is N>0: Can assume that E is derived from relational algebra expressions F and G via one of the five basic relational algebra operators, and that F = { t | f(t) }, G = { t | r(t) } are safe tuple relational calculus exps for F & G There are 5 cases to be considered, corresponding to the 5 basic operators of relational algebra. . . 6/9/2021 29 CS 319 Theory of Databases

From Relation Algebra to Calculus 4 Induction Step (cont. ) one case for each

From Relation Algebra to Calculus 4 Induction Step (cont. ) one case for each basic operator of relational algebra 1. Union E=F G 2. Set Difference E=F-G 3. Cartesian Prod E=F G 4. Projection E = i(1), i(2), . . . , i(k)(F) 5. Selection E = s. C(F), where C condition on the tuples in F Cases 1, 2 and 3: dom(y) = dom(f) dom(r). Cases 4 and 5: dom(y) = dom(f). 6/9/2021 30 CS 319 Theory of Databases

From Relation Algebra to Calculus 5 Induction Step (cont. ) Cases 1, 2 and

From Relation Algebra to Calculus 5 Induction Step (cont. ) Cases 1, 2 and 3: dom(y) = dom(f) dom(r). Cases 4 and 5: dom(y) = dom(f). Generally straightforward to specify the predicate y NB In each case, have 3 safety conditions to check: • satisfying tuples are within dom(y)* • condition on quantified subformulae ( / ) 6/9/2021 31 CS 319 Theory of Databases

From Relation Algebra to Calculus 6 Representing the 5 basic operations: union ( )

From Relation Algebra to Calculus 6 Representing the 5 basic operations: union ( ) 1. E = F G E = { t | y(t) } where y(t) (f r)(t) = f(t) r(t) Safe relational calculus expression: t satisfies (f r)(t) t satisfies f(t) or t satisfies r(t) t dom(f)* or t dom(r)* t dom(f)* dom(r)* dom(y)* Also: because - by inductive hypothesis - f and r are both safe, any / subformulae within f or r is safe. 6/9/2021 32 CS 319 Theory of Databases

From Relation Algebra to Calculus 7 Representing the 5 basic operations: difference(-) 2. E

From Relation Algebra to Calculus 7 Representing the 5 basic operations: difference(-) 2. E = F - G E = { t | y(t) } where y(t) (f r)(t) = f(t) r(t) Safe expression: t satisfies (f r)(t) t satisfies f(t) t dom(f)* dom(r)* dom(y)* Other part of safety check similar to the previous case 6/9/2021 33 CS 319 Theory of Databases

From Relation Algebra to Calculus 8 Representing the 5 basic operations: product ( )

From Relation Algebra to Calculus 8 Representing the 5 basic operations: product ( ) 3. E = F G Assume F and G have arity f and g respectively. E = { t(f+g) | y(t) } where y(t) ( u) ( v) (f(u) r(v) t[1]=u[1] t[2]=u[2] . . . t[f]=u[f] t[f+1]=v[1] t[f+2]=v[2] . . . t[f+g]=v[g] ) Must now check the safety conditions. . . 6/9/2021 34 CS 319 Theory of Databases

From Relation Algebra to Calculus 9 Representing the 5 basic operations: product ( )

From Relation Algebra to Calculus 9 Representing the 5 basic operations: product ( ) y ( u)( v)(f(u) r(v) . . . ) is safe RC expression: First safety condition: values are within dom(y)* t satisfies y(t) t[1]=u[1] t[2]=u[2] . . . t[f]=u[f] where u satisfies f(u) t[1], t[2], . . . , t[f] dom(f)* dom(r)* dom(y)* t[f+1], t[f+2], . . . , t[f+g] dom(y)* similarly 6/9/2021 35 CS 319 Theory of Databases

From Relation Algebra to Calculus 10 Second safety condition for product ( ) Consider

From Relation Algebra to Calculus 10 Second safety condition for product ( ) Consider the existential subformula: ( u) (( v) (f(u) r(v) t[1]=u[1] t[2]=u[2] . . . t[f]=u[f] t[f+1]=v[1] t[f+2]=v[2] . . . t[f+g]=v[g] )) - one of two introduced in y(t) not in f(u) and r(v). Has form: ( u)(F(u)) where dom(F) = dom(f) dom(r) u[i] dom(F) for some i u[i] dom(f) dom(r) for some i u[i] dom(f) u does not satisfy f(u) since f is safe u does not satisfy F(u) f(u) . . 6/9/2021 36 CS 319 Theory of Databases

From Relation Algebra to Calculus 11 Representing the 5 basic operations: 4. E =

From Relation Algebra to Calculus 11 Representing the 5 basic operations: 4. E = i(1), i(2), . . . , i(k)(F) E = { t(k) | y(t) } where y(t) ( u)(f(u) t[1]=u[ i(1) ] t[2]=u[i(2)] . . . t[k]=u[i(k)]) Safe relational calculus expression (1 st condition): t satisfies y(t) t[1]=u[i(1)] t[2]=u[i(2)] . . . t[k]=u[i(k)] where u satisfies f(u) and f is safe t[1], t[2], . . . , t[k] dom(f)* = dom(y)* 6/9/2021 37 CS 319 Theory of Databases

From Relation Algebra to Calculus 12 Second safety condition for projection ( ) Consider

From Relation Algebra to Calculus 12 Second safety condition for projection ( ) Consider the existential subformula: ( u) (f(u) t[1]=u[ i(1) ] t[2]=u[i(2)] . . . t[k]=u[i(k)]) introduced in constructing y(t) from f(u) Has form: ( u)(F(u)) where dom(F) = dom(f). u[i] dom(F) for some i u[i] dom(f) for some i u does not satisfy f(u) since f is safe u does not satisfy F(u) f(u) . . 6/9/2021 38 CS 319 Theory of Databases

From Relation Algebra to Calculus 13 Representing the 5 basic operations: s. C 5.

From Relation Algebra to Calculus 13 Representing the 5 basic operations: s. C 5. E = s. C(F) where C is a condition on tuples in F. C is expressed in terms of primitive relations of the form iqj combined using propositional logical connectives where i and j are indices of components of tuples in F and q is a relational operator (<, , =, , >) Transform C into C' in tuple relational calculus syntax by substituting t[i] for i throughout: then E = { t | y(t) } where y(t) f(t) C'. 6/9/2021 39 CS 319 Theory of Databases

From Relation Algebra to Calculus 14 Safety condition for selection (s. C) Selection is

From Relation Algebra to Calculus 14 Safety condition for selection (s. C) Selection is represented using y(t) f(t) C'. t satisfies y(t) t satisfies f(t) t dom(f)* t dom(y)* since dom(y) = dom(f) + any constants in C'. There is no need to check the other safety conditions as no new quantifiers are introduced in constructing y. 6/9/2021 40 CS 319 Theory of Databases

From Relation Algebra to Calculus 15 This completes the proof of theorem 1: Theorem

From Relation Algebra to Calculus 15 This completes the proof of theorem 1: Theorem 1: If E is a relational algebra expression, then there is a safe expression in tuple relational calculus equivalent to E. This shows that tuple relational calculus is at least as expressive as relational algebra. It remains to prove the converse: that relational algebra is as expressive as relational calculus … 6/9/2021 41 CS 319 Theory of Databases

To follow … From (domain) relational calculus to algebra 6/9/2021 42 CS 319 Theory

To follow … From (domain) relational calculus to algebra 6/9/2021 42 CS 319 Theory of Databases