Introduction to Satisfiability Modulo Theories SMT Clark Barrett

Introduction to Satisfiability. Modulo Theories (SMT) Clark Barrett, NYU Sanjit A. Seshia, UC Berkeley ICCAD Tutorial November 2, 2009 C. Barrett & S. A. Seshia ICCAD 2009 Tutorial

Boolean Satisfiability(SAT) p 1 Ç Æ p 2. . . : Æ Ç Ç pn Is there an assignment to the p 1, p 2, …, pn variables such that evaluates to 1? C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 2

Satisfiability. Modulo Theories p 1 x=y p 2 x+2 z¸ 1 . . . Ç Æ : Æ w & 0 x. FFFF = x Ç Ç pn x % 26 = v Is there an assignment to the x, y, z, w variables s. t. evaluates to 1? C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 3

Satisfiability. Modulo Theories • Given a formula in first-order logic, with associated background theories, is the formula satisfiable? – Yes: return a satisfying solution – No [generate a proof of unsatisfiability] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 4

Applications of SMT • Hardware verification at higher levels of abstraction (RTL and above) • Verification of analog/mixed-signal circuits • Verification of hybrid systems • Software model checking • Software testing • Security: Finding vulnerabilities, verifying electronic voting machines, … • Program synthesis • … C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 5

References Satisfiability Modulo Theories Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. Chapter 8 in the Handbook of Satisfiability, Armin Biere, Hans van Maaren, and Toby Walsh, editors, IOS Press, 2009. (available from our webpages) SMTLIB: A repository for SMT formulas (common format) and tools SMTCOMP: An annual competition of SMT solvers C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 6

Roadmap for this Tutorial • • Background and Notation Survey of Theories Theory Solvers Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 7

Roadmap for this Tutorial Ø Background and Notation • Survey of Theories • Theory Solvers • Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 8

First-Order Logic • A formal notation for mathematics, with expressions involving – Propositional symbols – Predicates – Functions and constant symbols – Quantifiers • In contrast, propositional (Boolean) logic only involves propositional symbols and operators C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 9

First-Order Logic: Syntax • As with propositional logic, expressions in first-order logic are made up of sequences of symbols. • Symbols are divided into logical symbols and non-logical symbols or parameters. • Example: (x = y) Æ (y = z) Æ (f(z) ¸ f(x)+1) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 10

First-Order Logic: Syntax • Logical Symbols – Propositional connectives: Ç, Æ, : , !, $ – Variables: v 1, v 2, . . . – Quantifiers: 8, 9 • Non-logical symbols/Parameters – Equality: = – Functions: +, -, %, bit-wise &, f(), concat, … – Predicates: ·, is_substring, … – Constant symbols: 0, 1. 0, null, … C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 11

Quantifier-free Subset • We will largely restrict ourselves to formulas without quantifiers (8, 9) • This is called the quantifier-free subset/fragment of first-order logic with the relevant theory C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 12

Logical Theory • Defines a set of parameters (non-logical symbols) and their meanings • This definition is called a signature. • Example of a signature: Theory of linear arithmetic over integers Signature is (0, 1, +, -, ·) interpreted over Z C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 13

Roadmap for this Tutorial Ø Background and Notation Ø Survey of Theories • Theory Solvers • Two Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 14

Some Useful Theories • • Equality (with uninterpreted functions) Linear arithmetic (over Q or Z) Difference logic (over Q or Z) Finite-precision bit-vectors – integer or floating-point • Arrays / memories • Misc. : Non-linear arithmetic, strings, inductive datatypes (e. g. lists), sets, … C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 15

Theory of Equality and. Uninterpreted Functions (EUF) • Also called the “free theory” – Because function symbols can take any meaning – Only property required is congruence: that these symbols map identical arguments to identical values i. e. , x = y ) f(x) = f(y) • SMTLIB name: QF_UF C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 16

Data and Function Abstraction with EUF … x 0 x 1 x 2 x xn-1 Bit-vectors to Abstract Domain (e. g. Z) Common Operations p x 1 ITE(p, x, y) y 0 If-then-else ALU f x y = x=y Test for equality Functional units to Uninterpreted Functions a = x Æ b = y ) f(a, b) = f(x, y) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 17

Hardware Abstraction with EUF IF/ID PC Op ID/EX Control EX/WB Control Rd Ra Instr F 1 Mem = Adat Reg. File ALU F 2 Imm F 3 +4 Rb = • For any Block that Transforms or Evaluates Data: – Replace with generic, unspecified function – Also view instruction memory as function C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 18

Example QF_UF (EUF) Formula (x = y) Æ (y = z) Æ (f(x) f(z)) Transitivity: (x = y) Æ (y = z) ) (x = z) Congruence: (x = z) ) (f(x) = f(z)) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 19

Equivalence Checking of Program Fragments int fun 1(int y) { int x, z; z = y; y = x; x = z; return x*x; } int fun 2(int y) { return y*y; } C. Barrett & S. A. Seshia SMT formula Satisfiable iff programs non-equivalent ( z = y Æ y 1 = x Æ x 1 = z Æ ret 1 = x 1*x 1) Æ ( ret 2 = y*y ) Æ ( ret 1 ret 2 ) What if we use SAT to check equivalence? ICCAD 2009 Tutorial 20

Equivalence Checking of Program Fragments int fun 1(int y) { int x, z; z = y; y = x; x = z; return x*x; } SMT formula Satisfiable iff programs non-equivalent ( z = y Æ y 1 = x Æ x 1 = z Æ ret 1 = x 1*x 1) Æ ( ret 2 = y*y ) Æ ( ret 1 ret 2 ) Using SAT to check equivalence (w/ Minisat) int fun 2(int y) { 32 bits for y: Did not finish in over 5 hours return y*y; 16 bits for y: 37 sec. } 8 bits for y: 0. 5 sec. C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 21

Equivalence Checking of Program Fragments int fun 1(int y) { int x, z; z = y; y = x; x = z; return x*x; } int fun 2(int y) { return y*y; } C. Barrett & S. A. Seshia SMT formula ’ ( z = y Æ y 1 = x Æ x 1 = z Æ ret 1 = sq(x 1) ) Æ ( ret 2 = sq(y) ) Æ ( ret 1 ret 2 ) Using EUF solver: 0. 01 sec ICCAD 2009 Tutorial 22

Equivalence Checking of Program Fragments int fun 1(int y) { int x; x = x ^ y; y = x ^ y; x = x ^ y; return x*x; } int fun 2(int y) { return y*y; } C. Barrett & S. A. Seshia Does EUF still work? No! Must reason about bit-wise XOR. Need a solver for bit-vector arithmetic. Solvable in less than a sec. with a current bit-vector solver. ICCAD 2009 Tutorial 23

Finite-Precision Bit-Vector Arithmetic (QF_BV) – Fixed width data words • Can model int, short, long, etc. – Arithmetic operations • E. g. , add/subtract/multiply/divide & comparisons • Two’s complement and unsigned operations – Bit-wise logical operations • E. g. , and/or/xor, shift/extract and equality – Boolean connectives C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 24

Linear Arithmetic (QF_LRA, QF_LIA) • Boolean combination of linear constraints of the form (a 1 x 1 + a 2 x 2 + … + an xn » b) • xi’s could be in Q or Z , » 2 {¸, >, ·, <, =} • Many applications, including: – Verification of analog circuits – Software verification, e. g. , of array bounds C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 25

Difference Logic (QF_IDL, QF_RDL) • Boolean combination of linear constraints of the form xi - xj » cij or xi » ci » 2 {¸, >, ·, <, =}, xi’s in Q or Z • Applications: – Software verification (most linear constraints are of this form) – Processor datapath verification – Job shop scheduling / real-time systems – Timing verification for circuits C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 26

Arrays/Memories • SMT solvers can also be very effective in modeling data structures in software and hardware – Arrays in programs – Memories in hardware designs: e. g. instruction and data memories, CAMs, etc. C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 27

Theory of Arrays (QF_AX) Select and Store • Two interpreted functions: select and store – select(A, i) – store(A, i, d) Read from A at index i Write d to A at index i • Two main axioms: – select(store(A, i, d), i) = d – select(store(A, i, d), j) = select(A, j) for i j • One other axiom: – (8 i. select(A, i) = select(B, i)) ) A = B C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 28

Equivalence Checking of Program Fragments int fun 1(int y) { int x[2]; x[0] = y; y = x[1]; x[1] = x[0]; return x[1]*x[1]; } SMT formula ’’ [ x 1 = store(x, 0, y) Æ y 1 = select(x 1, 1) Æ x 2 = store(x 1, 1, select(x 1, 0)) Æ ret 1 = sq(select(x 2, 1)) ] Æ ( ret 2 = sq(y) ) Æ ( ret 1 ret 2 ) int fun 2(int y) { return y*y; } C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 29

Roadmap for this Tutorial Ø Background and Notation Ø Survey of Theories Ø Theory Solvers • Two Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 30

Over to Clark… C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 31

Roadmap for this Tutorial Ø Background and Notation Ø Survey of Theories Ø Theory Solvers • Approaches to SMT Solving – Lazy Encoding to SAT ØEager Encoding to SAT • Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 32

Eager Approach to SMT Input Formula Satisfiability-preserving Boolean Encoder Boolean Formula SAT Solver satisfiable unsatisfiable EAGER ENCODING C. Barrett & S. A. Seshia SAT Solver involved in Theory Reasoning Key Ideas: • Small-domain encoding – Constrain model search • Rewrite rules • Abstraction-based methods (eager + lazy) Example Solvers: UCLID, STP, Spear, Boolector, Beaver, … ICCAD 2009 Tutorial 33

Theories • Eager Encoding Methods have been demonstrated for the following Theories: – Equality & Uninterpreted Functions – Integer Linear Arithmetic – Restricted Lambda expressions • Arrays, memories, etc. – Finite-precision Bit-Vector Arithmetic – Strings C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 34

UCLID Operation Input Formula Lambda Expansion for Arrays n Operation – Series of transformations leading to Boolean formula – Each step is validity (satisfiability) preserving – Each step performs optimizations http: //uclid. eecs. berkeley. edu C. Barrett & S. A. Seshia -free Formula Function & Predicate Elimination Linear/ Bitvector Arithmetic. Formula Encoding Arithmetic Boolean Formula Boolean Satisfiability ICCAD 2009 Tutorial 35

Rewrites: Eliminating Function Applications – Two applications of an uninterpreted function f in a formula – f(x 1) and f(x 2) Bryant, German, Velev’s Encoding Ackermann’s Encoding f(x 1) vf 1 f(x 1) f(x 2) vf 2 f(x 2) x 1= x 2 vf 1 = vf 2 C. Barrett & S. A. Seshia vf 1 ITE(x 1= x 2, vf 1, vf 2) ICCAD 2009 Tutorial 36

Small-Domain Encoding • Consider an SMT formula (x 1, x 2, …, xn) where xi 2 Di • Small-domain encoding/Finite instantiation: Derive finite set Si ½ Di s. t. |Si| ¿ |Di| – In some cases, Si is finite where Di is infinite • Encode each xi to take values only in Si – Could be done by encoding to SAT • Example: Integer Linear Arithmetic (QF_LIA) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 37

Solving QF_LIA is NP-complete • In NP: – If a satisfying solution exists, then one exists within a bound d • log d is polynomial in input size – Expression for d [Papadimitriou, ‘ 82] (n+m) ¢ (bmax +1) ¢ ( m ¢ amax ) 2 m+3 – Input size: • • m n bmax amax C. Barrett & S. A. Seshia – # constraints – # variables – largest constant (absolute value) – largest coefficient (absolute value) ICCAD 2009 Tutorial 38

Small-domain encoding / Finite Instantiation: Naïve approach • Steps – Calculate the solution bound d – Encode each integer variable with d log d e bits & translate to Boolean formula – Run SAT solver • Problem: For QF_LIA, d is W( m m ) – W( m log m ) bits per variable • Solution: Exploit special-cases and domainspecific structure C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 39

Special Case 1: Equality Logic • Linear constraints are equalities xi = xj • Result: d = n x 1 x 2 Æ x 2 x 3 Æ x 1 x 3 3 -valued domain is needed: {1, 2, 3} x 1 x 2 Æ x 2 x 3 Æ x 1 x 3 Can find solution with domain {1, 2} [Pnueli et al. , Information and Computation, 2002] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 40

Special Case 2: Difference Logic • Boolean combination of difference-bound constraints – xi ¸ xj + b, § xi ¸ b • Result: d = n ¢ (bmax + 1) [Bryant, Lahiri, Seshia, CAV’ 02] • Proof sketch: satisfying solution corresponds to shortest path in constraint graph – Longest such path has length · n ¢ (bmax + 1) • Tighter formula-specific bounds possible C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 41

Special Case 3: Generalized 2 SAT • Generalized 2 SAT constraints – xi + xj ¸ b, - xi - xj ¸ b, xi ¸ b • d = 2 ¢ n ¢ (bmax + 1) [Seshia, Subramani, Bryant, ’ 04] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 42

Full Integer Linear Arithmetic • Can we avoid the mm blow-up? • In fact, yes. The idea is to derive a new parameterized solution bound d – Formalize parameters that the bound really depends on – Parameters characterize sparse structure • Occurs especially in software verification; also in many high-level hardware models – [Seshia & Bryant, LICS’ 04, LMCS’ 05] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 43

Structure of Linear Constraints in Software Verification • Characteristics of studied benchmarks – Mostly difference constraints • Only 3% of constraints were NOT difference constraints – Non-difference constraints are sparse • At most 6 variables per constraint (total number of variables in 1000 s) • Some similar observations: Pratt’ 77, ESC/Java. Simplify-TR’ 03 C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 44

Parameterized Solution Bound n n New parameters: – k non-difference constraints, – w variables per constraint (width) Our solution bound: n ¢ (bmax +1) ¢ ( w ¢ amax ) k Previous: (n+m) ¢ (bmax +1) ¢ ( m ¢ amax ) 2 m+3 m #constraints n #variables bmax |constant| amax |coefficient| • Direct dependence on m eliminated (and k ¿ m ) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 45

Example Æ Ç : Ç x 1 - x 2 ¸ 1 x 1 + 2 x 2 + x 3 > -3 x 2 – x 4 ¸ 0 m #constraints 3 k #non-difference 1 n #variables 4 w width 3 bmax |constant| 3 amax |coefficient| 2 C. Barrett & S. A. Seshia d = 96 Previous d = 282, 175, 488 ICCAD 2009 Tutorial 46

Summary of d Values Logic Equality logic Difference logic Solution Bound d n n ¢ ( bmax + 1 ) Generalized 2 SAT logic 2 ¢ n ¢ ( bmax + 1 ) Full Integer Linear Arithmetic n ¢ (bmax + 1) ¢ (amaxk ¢ w k) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 47

Abstraction-Based Methods • For some logics, one cannot easily compute a closed-form expression for the small domain • Example: Bit-Vector Arithmetic • In such cases, an abstraction-refinement approach can be used to compute formula -specific small domains C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 48

Bit-Vector Arithmetic: Some History • B. C. (Before Chaff) – String operations (concatenate, field extraction) – Linear arithmetic with bounds checking – Modular arithmetic • SAT-Based “Bit Blasting” – Generate Boolean circuit based on bit-level behavior of operations • Handles arbitrary operations – Check with best available SAT solver – Effective in many applications • CBMC [Clarke, Kroening, Lerda, TACAS ’ 04] • Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’ 05] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 49

Research Challenge • Is there a better way than bit blasting? • Requirements – Provide same functionality as with bit blasting • Must support all bit-vector operators – Exploit word-level structure – Improve on performance of bit blasting • Current Approaches based on two core ideas: 1. Simplification: Simplify input formula using wordlevel rewrite rules and solvers 2. Abstraction: Can use automatic abstractionrefinement to solve simplified formula C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 50

Bit-Vector SMT Solvers, circa Spr. ’ 2009 Current Techniques with Sample Tools – Proof-based abstraction-refinement – UCLID [Bryant et al. , TACAS ’ 07] – Solver for linear modular arithmetic to simplify the formula – STP [Ganesh & Dill, CAV’ 07] – Automatic parameter tuning for SAT– Spear [Hutter et al. , FMCAD ’ 07] – Rewrites, underapproximation, efficient SAT engine – Boolector [Brummayer & Biere, TACAS’ 09] – Equality/constant propagation, logic optimization, special rules for non-linear ops - Beaver [Jha et al. , CAV’ 09] – DPLL(T) framework: Layered approach, rewriting – CVC 3 [Barrett et al. ], Math. SAT [Bruttomesso et al], Yices [Dutertre et al. ], Z 3 [de Moura et al] C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 51

Abstraction-Refinement • Deciding Bit-Vector Arithmetic with Abstraction [Bryant et al. , TACAS ’ 07, STTT ’ 09] – Use bit blasting as core technique – Apply to simplified versions of formula: under and over approximations – Generate successive approximations until a solution is found or formula shown unsatisfiable – Inspired by Mc. Millan & Amla’s proof-based abstraction for finite-state model checking • Small Motivating Example: (x + y y + x) Æ (x * y y * x) – Sufficient to prove the left-hand conjunct unsat C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 52

Approximations to Formula Overapproximation Original Formula Underapproximation + + More solutions: If unsatisfiable, then so is − Fewer solutions: Satisfying solution also satisfies − • Example Approximation Techniques – Underapproximating • Restrict word-level variables to smaller ranges of values – Overapproximating • Replace subformula with Boolean variable C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 53

Starting Iterations 1 − • Initial Underapproximation – (Greatly) restrict ranges of word-level variables – Intuition: Satisfiable formula often has small-domain solution C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 54

First Half of Iteration 1 + 1 − UNSAT proof: generate overapproximation If SAT, then done • SAT Result for 1− – Satisfiable • Then have found solution for – Unsatisfiable • Use UNSAT proof to generate overapproximation 1+ C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 55

Second Half of Iteration 1 + SAT: Use solution to generate refined underapproximation If UNSAT, then done 2 − 1 − • SAT Result for 1+ – Unsatisfiable: then have shown unsatisfiable – Satisfiable: solution indicates variable ranges that must be expanded • Generate refined underapproximation C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 56

Example 1+ : = (x = y+2) SAT x = 2, y = 0 UNSAT Look at proof : = (x = y+2) Æ (x 2 > y 2) 2− : = (x[2] = y[2]+2) Æ (x[2] > y[2] 2 2) SAT, done. 1− : = (x[1] = y[1]+2) Æ (x[1]2 > y[1]2) C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 57

Iterative Behavior • Underapproximations 2 + + 1 k+ k− – Successively more precise abstractions of – Allow wider variable ranges • Overapproximations – No predictable relation – UNSAT proof not unique 2 − 1 − C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 58

Overall Effect • Soundness 2 + + 1 UNSAT k+ SAT k− 2 − 1 − C. Barrett & S. A. Seshia – Only terminate with solution on underapproximation – Only terminate as UNSAT on overapproximation • Completeness – Successive underapproximations approach – Finite variable ranges guarantee termination • In worst case, get k− ICCAD 2009 Tutorial 59

Roadmap for this Tutorial Ø Background and Notation Ø Survey of Theories Ø Theory Solvers Ø Approaches to SMT Solving – Lazy Encoding to SAT – Eager Encoding to SAT Ø Conclusion C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 60

Summary of Ideas: Modeling • Philosophy: Model systems in first-order logic + suitable theories • Widely-used theories: – Equality and uninterpreted functions – Linear arithmetic – Bit-vector arithmetic – Arrays C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 61

Summary of Ideas: Lazy Methods • Philosophy: Extend DPLL framework from SAT to SMT • Literals assigned by SAT are sent to Theory Solver • Theory Solver determines if literals are satisfiable in theory • Key optimizations: small explanations, early conflict detection, theory propagation C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 62

Summary of Ideas: Eager Methods • Philosophy: Constrain solution space with logic-specific methods • Small-domain encoding – Compute bounds that work for any formula in the logic • Abstraction-refinement of domains – Compute formula-specific small domains • Rewrite rules: high level and bit level – Simplify formula before and after bit-blasting C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 63

Challenges and Opportunities • Solvers for new theories – Strings – Non-linear arithmetic – Can we exploit domain-specific structure? • Parallel SMT • Better support for quantifiers • Better proof/interpolant generation C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 64

Join the SMT Community • We need your new, exciting applications! • Contribute to SMT-LIB • Create new solvers, compete in SMTCOMP Slides and book chapter available on our websites: Clark: http: //cs. nyu. edu/~barrett Sanjit: http: //www. eecs. berkeley. edu/~sseshia C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 65