Theorem Proving for FOL Satisfiability Procedures CS 294

Theorem Proving for FOL Satisfiability Procedures CS 294 -8 Lecture 11 Prof. Necula CS 294 -8 Lecture 11 1

A Simple and Complete Prover • Define the following symbolic “prove” algorithm – Prove(H, G) - prove the goal “H G” Prove(H, true) Prove(H, G 1 G 2) Prove(H, H 1 G 2) Prove(H, x. G) Prove(H, L) = true = prove(H, G 1) && prove(H, G 2) = prove(H H 1, G 2) = prove(H, G[a/x]) (a is “fresh”) = unsat(H L) • We have a simple, sound and complete prover – If we have a way to check unsatisfiability of sets of literals Prof. Necula CS 294 -8 Lecture 11 3

How Powerful is Our Prover? • With VCGen in mind we must restrict invariants to H : : = L | true | H 1 H 2 • No disjunction, implication or quantification ! – Is that bad ? • Consider the function: void insert(LIST *a, LIST * b) { LIST *t = a->next; a->next = b; b->next = t; } • And the problem is to verify that – It preserves linearity: all list cells are pointed to by at most one other list cell – Provided that b is non-NULL and not pointed to by any cell Prof. Necula CS 294 -8 Lecture 11 4

Lists and Linearity • A bit of formal notation (remember the sel/upd): – We write sel(n, a) to denote the value of “a->next” given the state of the “next” field is “n” – We write upd(n, a, b) to denote the new state of the “next” field after “a->next = b” Code is void insert(LIST *a, LIST * b) { LIST *t = a->next; a->next = b; b->next = t; } Pre is ( q. q 0 p 1. p 2. sel(n, p 1) = sel(n, p 2) = q p 1 = p 2) b 0 p. sel(n, p) b a 0 Post is ( q. q 0 p 1. p 2. sel(n, p 1) = sel(n, p 2) = q p 1 = p 2) VC is Pre Post[upd(n, a, b), b, sel(n, a)) / n] Prof. Necula CS 294 -8 Lecture 11 Not a G ! 5

Two Solutions • So it is quite easy to want to step outside H • So what can we do? 1. Extend the language of H – And then extend the prover 2. Push the complexity of invariants into literals – And then extend the unsatisfiability procedure Prof. Necula CS 294 -8 Lecture 11 6

Goal Directed Theorem Proving (1) • Finally we extend the use of quantifiers: G : : = L | true | G 1 G 2 | H G | x. G H : : = L | true | H 1 H 2 | x. H • We have now introduced an existential choice – Both in “H x. G” and “ x. H G” • Existential choices are postponed – Introduce unification variables + unification prove(H, x. G) = prove(H, G[u/x] ) (u is a unif var) prove(H, u = t) = instantiate u with t if u Ï FV(t) • Still sound and complete goal directed proof search ! – Provided that unsat can handle unification variables ! Prof. Necula CS 294 -8 Lecture 11 7

Goal Directed Theorem Proving (2) • We can add disjunction (but only to goals): G : : = true | L | G 1 G 2 | H G | x. G | G 1 G 2 • Extend prover as follows: prove(H, G 1 G 2) = prove(H, G 1) || prove(H, G 2) • This introduces a choice point in proof search – Called a “disjunctive choice” – Backtracking is complete for this choice selection • But only in intuitionistic logic ! Prof. Necula CS 294 -8 Lecture 11 8

Goal Directed Theorem Proving (3) • Now we extend a bit the language of hypotheses – Important since this adds flexibility for invariants and specs. H : : = L | true | H 1 H 2 | G H • We extend the prover as follows: prove(H, (G 1 H 1) G) = prove(H, G) || (prove(H H 1, G) && prove(H, G 1)) – This adds another choice (clause choice in Prolog) expressed here also as a disjunctive choice – Still complete with backtracking Prof. Necula CS 294 -8 Lecture 11 9

Goal Directed Theorem Proving (4) • The VC for linear lists can be proved in this logic ! – This logic is called Hereditary Harrop Formulas • But the prover is not complete in a classical sense – And thus complications might arise with certain theories • Still no way to have disjunctive hypotheses – The prover becomes incomplete even in intuitionistic logic – E. g. , cannot prove even that P Q Q P • Let’s try the other method instead … Prof. Necula CS 294 -8 Lecture 11 10

A Theory of Linear Lists • Push the complexity into literals • Define new literals: linear(n) =def q. q 0 p 1. p 2. sel(n, p 1) = sel(n, p 2) = q p 1 = p 2 rc 0(n, b) =def b 0 p. sel(n, p) b • Now the predicates become: Pre is linear(n) rc 0(n, b) a 0 b 0 Post is linear(n) VC is linear(n) rc 0(n, b) a 0 b 0 linear(upd(n, a, b), b, sel(n, a)))) This is a G ! • The hard work is now in the satisfiability procedure Prof. Necula CS 294 -8 Lecture 11 11

A Theory of Linear Lists • In order to allow the prover to work with “linear” and “rc 0” we must define their meaning: – Semantically (by giving the definitions from before) – Axiomatically (by giving a set of axioms that define them): linear(n) a 0 rc 0(n, b) linear(upd(n, a, b)) linear(n) a 0 rc 0(n, b) rc 0(upd(n, a, b), sel(n, a)) • Now we can prove the VC with just three uses of these axioms • Is this set of axioms complete? Prof. Necula CS 294 -8 Lecture 11 12

Discussion • It makes sense to push hard work in literals: – – Can be handled in a customized way within the Sat procedures The hand-crafted inference rules guide the prover The inference rules are useful lemmas Important technique #3 • Just like in type inference, or data flow analysis : Theorem Proving Type Inference Data Flow Analysis Literals Type system Lattice Inference rules Typing rules Transfer functions Sat. procedure Inference algorithm Iterative D. F. A. Prof. Necula CS 294 -8 Lecture 11 13

Theories • Now we turn to unsat(L 1, …, Lk) • A theory consists of: – A set of function and predicate symbols (syntax) – Definitions for the meaning of these symbols (semantics) • Semantic or axiomatic definitions • Example: – Symbols: 0, 1, -1, 2, -2, …, +, -, =, < (with the usual meaning) • Theory of integers with arithmetic (Presburger arithmetic) Prof. Necula CS 294 -8 Lecture 11 14

Decision Procedures for Theories • The Decision Problem: – Decide whether a formula in a theory + FOL is true • Example: – Decide whether x. x > 0 ( y. x = y + 1) in {N, +, =, >} • A theory is decidable when there is an algorithm that solves the decision problem for theory – This algorithm is the decision procedure for theory Prof. Necula CS 294 -8 Lecture 11 15

Satisfiability Procedures for Theories • The Satisfiability Problem – Decide whether a conjunction of literals in theory is satisfiable – Factor out the FOL part of the decision problem • This is what we need to solve in our simple prover • We will explore a few useful theories and satisfiability procedures for them … Prof. Necula CS 294 -8 Lecture 11 16

Examples of Theories. Equality. • The theory of equality with uninterpreted functions • Symbols: =, , f, g, … • Axiomatically defined: E 2 = E 1 E=E E 1 = E 2 E 2 = E 3 E 1 = E 2 f(E 1) = f(E 2) • Example of a satisfiability problem: g(g(g(x)) = x g(g(g(x))))) = x g(x) x Prof. Necula CS 294 -8 Lecture 11 17

A Satisfiability Procedure for Equality • Definitions: – Let R be a relation on terms – The equivalence closure of R is the smallest relation that is closed under reflexivity, symmetry and transitivity • An equivalence relation • Equivalence classes – Given a term t we say that t* is its representative – Two terms t 1 and t 2 are equivalent iff t 1* = t 2* – Computable in near-linear time (union-find) • The congruence closure of a relation is the smallest relation that is closed under equivalence and congruence Prof. Necula CS 294 -8 Lecture 11 18

A Representation for Symbolic Terms • We represent terms as DAGs – Share common subexpressions – E. g. f(f(a, b): f f a b • Equalities are represented as dotted edges – E. g. f(f(a, b) = a – We consider the transitive closure of dotted edges Prof. Necula CS 294 -8 Lecture 11 19

Computing Congruence Closure • We pick arbitrary representativs for all equivalence classes (nodes connected by dotted edges) • For all nodes t = f(t 1, …, tn) and s = f(s 1, …, sn) – If ti* = si* for all i = 1. . n (find) – We add an edge between t* and s* and pick one of them as the representative for the entire class (union) f f f a f b a Prof. Necula CS 294 -8 Lecture 11 b 20

Computing Congruence Closure (Cont. ) • Congruence closure is an inference procedure for theory of equality – Always terminates because it does not add nodes • The hard part is to detect the congruent pairs or terms – There are tricks to do this in O(n log n) • We say that f(t 1, …, tn) is represented in the DAG if there is a node f(s 1, …, sn) such that si* = ti* Prof. Necula CS 294 -8 Lecture 11 21

Satisfiability Procedure for Equality 1. 2. 3. 4. 5. Given F = i ti = ti’ j uj uj’ Represent all terms in the same DAG Add dotted edges for t. I = t. I’ Construct the congruence closure of those edges Check that j. uj* uj’* Theorem: F is satisfiable iff j. uj* uj’* Prof. Necula CS 294 -8 Lecture 11 22

Example with Congruence Closure • Consider: g(g(g(x)) = x g(g(g(x))))) = x g(x) x g g g g g x x Prof. Necula CS 294 -8 Lecture 11 Contradiction 23

Congruence Closure. Discussion. • The example from before has little to do with program verification • But equality is still very useful • The congruence closure algorithm is the basis for many unification-based satisfiability procedures – We add the additional axiom: f(E 1) = f(E 2) E 1 = E 2 – Or equivalently: E 1 = E 2 f-1(E 1) = f-1(E 2) Prof. Necula CS 294 -8 Lecture 11 24

Presburger Arithmetic • The theory of integers with +, -, =, > • The most useful in program verification after equality – And quite useful for program analysis also • Example of a satisfiability problem: y > 2 x + 1 y + x > 1 y < 0 • Satisfiability of a system of linear inequalities – Known to be in P (with rational solutions) – Some of the algorithms are quite simple – If we add the requirement that solutions are in Z then the problem is NP-complete Prof. Necula CS 294 -8 Lecture 11 25

Difference Constraints • A special case of linear arithmetic • All constraints of the form: xi - xj c or xi - 0 c or 0 - xj c • The most common form of constraint • Construct a directed graph with: – A node for 0 – A node for each variable xi – A edge from xi to xj of weight c for each xi - xj c xi c xj Prof. Necula CS 294 -8 Lecture 11 26

Difference Constraints Theorem: A set of difference constraints is satisfiable iff there is no negative weight cycle in the graph • Can be solved with Bellman-Ford in O(n 2) – In practice n is typically quite small – In practice we use incremental algorithms (to account for assumptions being pushed and popped) • Algorithm is complete ! • Was used successfully in array-bounds checking elimination and induction variable discovery Prof. Necula CS 294 -8 Lecture 11 27

Extensions of Difference Constraints • Shostak extended the algorithm to ax + by c • Construct a graph as before – One node for each variable – One undirected edge for each constraint • An admissible loop in this graph is a loop in which any two adjacent edges “ax + by c” and “dy + ez f” have sgn(b) sgn(d) – The residue of such adjacent edges is a constraint on x and z a|d| x + e|b| z c|d| + f|b| – The residue for a loop is an inequality without variables Theorem: The inequalities are satisfiable iff all residues for simple loops are satisfiable Prof. Necula CS 294 -8 Lecture 11 28

How Complete are These Procedures? • Consider: 3 x 2 y 3 y 4 3 2 x ¢ 3 x y Residue is: 13. 5 8 satisfiable ¢ 2 ¢ 4. 5 But only in Q, not in Z 0 • • The unsat procedure is sound: unsat Q unsat Z But it is incomplete ! Not a problem in practice Or the problem goes away with tricks like this: Transform “ax b” into “x b/a ” Prof. Necula CS 294 -8 Lecture 11 29

Arithmetic. Discussion • There are many satisfiability algorithms – Even for the general case (e. g. Simplex) – Except for difference constraints, all are incomplete in Z – But Z can be handled well with heuristics • There are no practical satisfiability procedures for (Q, ) and the satisfiability of (Z, ) is only semidecidable Prof. Necula CS 294 -8 Lecture 11 30

Combining Satisfiability Procedures • We have developed sat. procedures for several theories – We considered each theory in part – Can we combine several sat. procedures? • Consider equality and arithmetic f(f(x) - f(y)) f(z) x y y+z x x=y 0 z y x f(x) = f(y) false f(x) - f(y) = z Prof. Necula CS 294 -8 Lecture 11 0=z 31

Combining Satisfiability Procedures • Combining satisfiability procedures is non trivial • And that was to be expected: – (Z, +) and (Q, ) are decidable, but (Z, ) is not – Equality was solved by Ackerman in 1924, arithmetic by Fourier even before, but E + A only in 1979 ! • Yet in any single verification problem we will have literals from several theories: – Equality, arithmetic, lists, … • When and how can we combine separate satisfiability procedures? Prof. Necula CS 294 -8 Lecture 11 32

Nelson-Oppen Method (1) 1. Represent all conjuncts in the same DAG f(f(x) - f(y)) f(z) y x x y + z z 0 f f + y x z Prof. Necula CS 294 -8 Lecture 11 0 33

Nelson-Oppen Method (2) 2. Run each sat. procedure • • Require it to report all contradictions (as usual) Also require it to report all equalities between nodes f f + y x z Prof. Necula CS 294 -8 Lecture 11 0 34

Nelson-Oppen Method (3) 3. Broadcast all discovered equalities and re-run sat. procedures • Until no more equalities are discovered or a contradiction arises f x Contradiction f f f + y x z Prof. Necula CS 294 -8 Lecture 11 0 35

Puzzle: Constructive vs. Classical Proofs • Prove the following fact: • Hint: Try Prof. Necula CS 294 -8 Lecture 11 36