Automating Abstract Interpretation Thomas Reps University of Wisconsin
Automating Abstract Interpretation Thomas Reps University of Wisconsin and Gramma. Tech, Inc.
Automating Abstract Interpretation Thomas Reps University of Wisconsin and Gramma. Tech, Inc. and Creating Improved Decision Procedures as a Bonus
• The administrator of the U. S. S. Yorktown’s Standard Monitoring Control System entered 0 into a data field for the Remote Data Base Manager program. That caused the database to overflow and crash all LAN consoles and miniature remote terminal units • The Yorktown was dead in the water for about two hours and 45 minutes 3
Analysis must track numeric information • A sailor on the U. S. S. Yorktown entered a 0 into a data field in a kitchen-inventory program • That caused an overflow, which crashed all LAN consoles and miniature remote terminal units • The Yorktown was dead in the water for about two hours and 45 minutes 4
x = 3; y = 1/(x-3); need to track values other than 0 x = 3; px = &x; y = 1/(*px-3); need to track heap-allocated storage need to track pointers x = 3; p = (int*)malloc(sizeof int); *p = x; q = p; y = 1/(*q-3);
Static Analysis in a Nutshell • Determine information about the possible situations that can arise at execution time, without actually running the program on specific inputs • Typically: – For each point in the program, find a descriptor that represents (a superset of) the stores that could possibly arise at that point • Correctness of an analysis justified via abstract interpretation [Cousot & Cousot 77] 6
Automating Abstraction Interpretation • Abstract interpretation – A “black art” → hard to work with • 20 -year quest to raise the level of automation in abstract interpretation – 3 -valued logic analysis (TVLA) • with M. Sagiv, R. Wilhelm, T. Lev-Ami, A. Loginov, & many others – machine-code analysis (TSL) • with J. Lim – symbolic-abstraction algorithms • with M. Sagiv, G. Yorsh, A. Thakur Reps, T. and Thakur, A. , “Automating abstract interpretation, ” VMCAI, 2016. research. cs. wisc. edu/wpis/papers/vmcai 16 -invited. pdf Patrick Cousot Radhia Cousot
What Does It Mean to Automate Parsing? • A parsing-problem instance Parse(L, s) has two inputs – L = a context-free language – s = a string to be parsed The string changes more frequently than the language • A context-free language has a context-free grammar • Yacc (and later, Gnu Bison) – Input: a context-free grammar that describes the language L to be parsed – Output: a parsing function, yyparse(), for which executing yyparse() on string s computes Parse(L, s) Steve Johnson 8 source: simple-talk interview
What Does It Mean to Automate Program Analysis? • Follow a similar scheme. . . • But first, why would you even want to invest the time doing so? 9
10
11
12
Why is Program Analysis Difficult? • 13
Sidestepping Undecidability Universe of States 14
Sidestepping Undecidability Overapproximate the reachable states False positive! Universe of States 15
Why is Program Analysis Difficult? • 16
Why is Program Analysis Difficult? • Large/unbounded base types: int, float, string • User-defined types/classes • Pointers/aliasing + unbounded #’s of heap-allocated cells • Procedure calls/recursion/calls through pointers/dynamic method lookup/overloading • Concurrency + unbounded #’s of threads 17
Sources of Infinity • Data – unbounded counters, integer variables, lists, queues • Control structures – procedures, process creation • Configuration parameters – unbounded number of processes, principals • Real-time – discrete or continuous time 18
Some Successes of the Field • Static Driver Verifier, a. k. a. SLAM (Microsoft) – Tool for finding possible bugs in Windows device drivers – Complicated back-out protocols in driver APIs when events cancelled or interrupted • Astrée (ENS) – Established the absence of run-time errors in Airbus flight software 19
Outline • • • Gentle introduction to abstract interpretation First glimmer of insight Second insight Third insight Di. SSolve: A parallel SAT solver Wrap-up 20
Example: Parity Analysis f (a, b) = (16 * b + 3) * (2 * a + 1) * + * 16 + * 3 b 2 1 a + 0 1 2 3 . . . * 0 1 2 3 . . . 0 0 0 . . . 1 1 2 3 4 . . . 1 0 1 2 3 . . . 2 2 3 4 5 . . . 2 0 2 4 6 . . . 3 3 4 5 6 . . . 3 0 3 6 9 . . . ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 21
Example: Parity Analysis O O E E E ? O E ? ? O ? E O E ? O E 16 3 b ? E 1 2 a O ? ? O E ? ? ? E O ? O E E E 22
Abstract values, such as O, E, and ? , represent potentially infinite collections of concrete values 23
Constant Propagation [i ? , j ? ] i=0 e. e[i 0] [i 0, j ? ] j=0 e. e[j 0] e. e [i 0, j 0] [i 1, j 0] while i 2 e. e [i 0, j 0] j = (j+1)/4 e. e[j (e(j)+1)/4] [i 0, j 0] i = i+1 printf(i, j) e. e[i e(i) + 1] [i 0, j 0] 24
Constant Propagation [i ? , j ? ] i=0 e. e[i 0] [i 0, j ? ] j=0 e. e[j 0] e. e [i ? , j 0] while (…) e. e [i [i 0, ? , j 0] j = (j+1)/4 e. e[j (e(j)+1)/4] i {…, -2, -1, 0, 1, 2, …} [i 0, ? , j 0] j {0} i = i+1 printf(i, j) e. e[i e(i) + 1] [i 0, ? , j 0] 25
What Does It Mean to Automate Abstract Interpretation? • 26
Abstract Interpretation [CC 77] {(x 2, y 1), (x 5, y 3)} {(2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3)} γ x [2, 5] y [1, 3] α α Universe of States Patrick Cousot Radhia Cousot
Best Transformer [CC 79] However, no algorithms to • apply the best transformer • create the best transformer Loss of precision τ ττ τ τ ττ γ α γ safe τ# τ# Universe of States Patrick Cousot Radhia Cousot
Challenge: Abstract Interpretation is [-5, 5] Inherently Non-Compositional [5, 10] [-10, -5] x rely on compositionality • In computer science, we – – languages are expressed using context-free grammars [5, 10]using many concepts and properties defined x inductive definitions recursive tree traversals are a basic workhorse software organized into layers • Example: (x + (–x)), evaluated in (x ↦ [5, 10], y ↦ [10, 20]) – [-5, 5] versus [0, 0] – Suppose that you have in hand a collection of ``best” abstractinterpretation operators – Their composition may not provide the best (abstract) answer for the composition of the corresponding concrete operations 29
Outline • • • Gentle introduction to abstract interpretation First glimmer of insight Second insight Third insight Di. SSolve: A parallel SAT solver Wrap-up 30
A First Glimmer 40% of the people in the Wisconsin CS department are doing machine learning, but don’t know it Jude Shavlik Mooly Sagiv Greta Yorsh
Symbolic Abstraction (x ↦ [5, 10]) ⇝ 5 ≤ x ˄ x ≤ 10 Interval ⇝ conjunction of environment single-variable inequalities States L A Typically, L is a rich language and A is an impoverished logic fragment L'. Symbolic abstraction addresses a fundamental approximation problem: Given L, find the strongest consequence of that is expressible in L'. 32
Example Adds al, the low-order byte of 32 -bit register eax, to bh, the second-to-lowest byte of 32 -bit register ebx eax ebx 33
From concrete semantics to formulas Bitvector-and [with constant] may not be an operator in the impoverished logic of the abstract domain Bitvector-or may not be an operator in the impoverished logic of the abstract domain Multiplication [by constant] may not be an operator in Primed variables represent values the impoverished logic of in the post-state the abstract domain 34
What Does It Mean to Automate Abstract Interpretation? • Use logic, such as quantifier-free bitvector arithmetic (QF_ABV) 35
C L A
S (S) S C L A
S (S) S C L A
(S) S C L A
(S) S C L A
C unsat L A
Mooly Sagiv Greta Yorsh
[x 43, y 0] S [x 43, y 0] Concrete Values Formulas Abstract Values
[x 43, y 0] (x = 43) (y = 0) S [x 43, y 0] Concrete Values Formulas Abstract Values
[x 43, y 0] Concrete Values Formulas Abstract Values
[x 46, y 0] S [x 46, y 0] Concrete Values Formulas [x 43, y 0] Abstract Values
[x T, y 0] S (y = 0) [x 46, y 0] Concrete Values Formulas Abstract Values
[x T, y 0] Concrete Values unsat Formulas Abstract Values
What Does It Mean to Automate Abstract Interpretation? • 51
A First Glimmer 40% of the people in the Wisconsin CS department are doing machine learning, but don’t know it Jude Shavlik Find-S Algorithm for learning a concept in a concept lattice: • Discard all negative examples • Return the join of all positive examples 52
Symbolic Abstraction States L A Typically, L is a rich language and A is an impoverished logic fragment L'. Symbolic abstraction addresses a fundamental approximation problem: Given L, find the strongest consequence of that is expressible in L'. 53
The Story in a Nutshell • 54
Reduced Product A 1 L A 2 55
Reduced Product (a even, b odd, c Τ) 3 a 12 5 b 10 7 c 7 231 a = 0 231 b = 231 (a even, b odd, c odd) Parity (a [3, 12], b [5, 10], c [7, 7]) (a [4, 12], b [5, 9], c [7, 7]) L Interval 56
Outline • 57
Stålmarck’s method Propagation Rules 58
Stålmarck’s method Propagation Rules 59
Stålmarck’s method Propagation Rules 60
Stålmarck’s method Dilemma Rule • Split • Propagate • Merge Gunnar Stålmarck
Stålmarck’s method Inconsistent Facts 62
Key Insight Stålmarck 63
Key Insight Abstract Interpretation 64
Key Insight 65
Stålmarck’s method Dilemma Rule • Split • Propagate • Merge Gunnar Stålmarck
Dilemma Rule • Split • Propagate • Merge Aditya Thakur
Stålmarck = Stålmarck[Equivalence] 68
Key Insight propositional logic
Key Insight richer logic QF_LRA logic (quantifier-free linear rational arithmetic) QF_ABV logic (quantifier-free bit-vector arithmetic) 70
Stålmarck[Boolean, Polyhedra] for LRA Dilemma Rule 71
72
73
Dilemma Rule 74
75
76
77
… Generalize example to k “diamonds” 78
Comparison with Z 3 79
Decision Procedures and Symbolic Abstraction Recipe for unsatisfiability checking: Satisfiability modulo abstraction States L ⊥ A 80
A plea: Decision-procedure developers should generalize their APIs to make available the residuals of “failed” refutations 81
Symbolic Abstraction: Dual-Use 82
Outline • • • Gentle introduction to abstract interpretation First glimmer of insight Second insight Third insight Di. SSolve: A parallel SAT solver Wrap-up 83
A Similar Contemporaneous Insight! DPLL/CDCL SAT Solver Vijay D’Silva Leo Haller Daniel Kroening
A Similar Contemporaneous Insight! Abstract Interpretation Vijay D’Silva Leo Haller Daniel Kroening
Outline • • • Gentle introduction to abstract interpretation First glimmer of insight Second insight Third insight Di. SSolve: A parallel SAT solver Wrap-up 86
Di. SSolve: A Parallel SAT solver • Dilemma Rule is trivially parallelizable • Stålmarck’s method not competitive with modern SAT solvers – Use an existing SAT solver – Use rule that is not-quite Dilemma rule 87
Di. SSolve: First Round Sequential SAT Solver glucose Clauses learned via CDCL techniques glucose Union + Subsumption check 88
Di. SSolve: Second Round glucose 89
Existing Parallel SAT Solvers Divide-and-conquer solvers • Static partitioning of state space • Problems with load balancing Portfolio solvers • Different sequential solvers competing • Problems with crafting diverse solvers 90
Di. SSolve • Dynamic partitioning of state space • Each solver works on a separate portion of the search space • Frequent communication of learned information • Reuses engineering effort and heuristics from an existing solver (Glucose) – – – Variable ranking Clause ranking Restart schedule Phase saving Final-conflict clauses 91
Di. SSolve on a 32 -Core Machine • Compare sequential and portfolio method with 2 variants of Di. SSolve – dilemma-style 32 -way case split – 32 -way search using fresh random seeds • Benchmarks from application track of SAT-COMP’ 13 • Timeout of 1000 seconds 92
Time (secs) Cactus Plot Lower and to the right is better Number of benchmarks solved 93
Time (secs) Cactus Plot Lower and to the right is better Number of benchmarks solved 94
Di. SSolve on the Cloud • 95
Di. SSolve on the Cloud • 7 timeouts (Dissolve[128]) 96
What is Promising Here. . . Decision • Create more powerful solvers using concepts from abstract interpretation • Create solvers for new logic fragments Procedures for Logics Satisfiability Modulo Abstraction Symbolic Abstraction • Fundamental technique for working with abstractions Analysis/Verification of Programs • Provides the means for automating abstract interpretation Correct-by-construction analyzers 97
Symbolic Abstraction: Dual-Use States L ⊥ A 98
Symbolic Abstraction: Dual-Use States L Dagstuhl Seminar 14351 A = L’ 99
Connections, . . . States L • Automated reasoning & decision procedures • Machine learning • Knowledge compilation • Consequence finding • Data integration • Constraint programming A Typically, L is a rich language and A is an impoverished logic fragment L'. Symbolic abstraction addresses a fundamental approximation problem: Given L, find the strongest consequence of that is expressible in L'. 100
A Plug for. . . Aditya Thakur Leo Haller T. Reps and A. Thakur, Automating abstract interpretation. To appear in Proc. VMCAI, Jan. 2016. research. cs. wisc. edu/wpis/papers/vmcai 16 -invited. pdf 101
Automating Abstract Interpretation Thomas Reps University of Wisconsin and Gramma. Tech, Inc. and Creating Improved Decision Procedures as a Bonus
- Slides: 103