Refinements to techniques for verifying shape analysis invariants

Summary • Formal Methods for Imperative Languages Such as C – Many bugs caused

Research contribution • Extension of Coq based program verification to larger programs with more

$A tree traversal example in C Struct list { struct list *fld_n; struct tree$

What this program does t 10 12 14 3 r p i 1 Nil

What this program does t 10 12 14 3 r p i 1 10

What this program does 10 t 12 14 3 r p i 1 10

What this program does 10 12 t 14 3 r p i 1 12

What this program does 10 12 t 14 3 r p i 1 14

What this program does 10 12 14 3 r p i 1 14 18

What this program does 10 12 14 3 r p i 1 16 18

What this program does r 10 12 14 3 t 1 18 2 16

What this program does 10 12 14 3 r p i 1 18 Nil

What this program does 10 12 14 3 r p i 1 20 Nil

Invariants to be formally proven • The program maintains two well formed linked lists,

Coq Goal {? } WHILE not (T == 0) DO N : = P;

Program state Environment R=10 I=30 P=40 T=10 e = { R → 10, I

Separation logic h={10 → 12, 11 → 18, 12 → 14, 13 → 16,

Separation logic (e, h) ⊨ s 1 ∗ s 2 if and only if

Data structure relationships (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R, v

Data structure relationships T→ (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R,

Deep model • Process – Create data structure for predicates – Write semantic interpretation

Summary of tactics • Forward propagation • Fold/unfold • Merge – Works by pairing

Results (so far) • Tree traversal – Code size: ~30 lines – Invariant size:

Research contributions • Extension of Coq separation logic reasoning to larger programs with more

Proposed work for Ph. D • Finish DPLL verification • Prove all underlying theorems

Deep model • Consider proving: a+c = a+b+c−b

Deep Model type expr = | Const int | Var id | Plus expr×expr

Deep Model Prove the following: Plus a b = simplify (Plus a (Plus b

Deep Model • Parameterized predicate data types type expr = | Const int |

Verification of initialization {∃v 0. TREE(R, v 0 , 2, [0, 1])} T :

Verification of initialization {∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R]}

Verification of initialization ? 1234 → ∃v 0. ∃v 1. ∃v 2. TREE(R, v

Unfold example {∃v 0∃v 1∃v 2[Tmp l = 0]∗[l /= 0]∗[tmp r = 0]∗

Unfold example ∃v 0 ∃v 1 ∃v 2 ∃v 3 ∃v 4 I +

Unfold example ∃v 0 ∃v 1 ∃v 2 ∃v 3 ∃v 4 ∃v 5

DPLL Efficient SAT solving algorithm for CNF expressions such as: (A v ~B v

$Data structure #define VAR_COUNT 4 char assignments[VAR_COUNT]; struct clause { struct clause *next; char$

DPLL invariant The first part of the invariant are spacial constructs asserting the two

Slides: 55

Download presentation

Refinements to techniques for verifying shape analysis invariants in Coq Kenneth Roe GBO Presentation 9/30/2013 The Johns Hopkins University

Summary • Formal Methods for Imperative Languages Such as C – Many bugs caused by corruption of data structures – Use formal methods to document data structure invariants and then verify correct program execution – Framework being developed in the Coq interactive theorem prover

Research contribution • Extension of Coq based program verification to larger programs with more complex data structures. – Existing systems only work on small examples

$A tree traversal example in C Struct list { struct list *fld_n; struct tree$

A tree traversal example in C Struct list { struct list *fld_n; struct tree *fld_t; }; Struct tree { struct tree *fld_l, *fld_r; int value; }; struct list *p; void build_pre_order(struct tree *r) { struct list *i = NULL, *n, *x; struct tree *t = r; p = NULL; while (t) { n = p; p = malloc(sizeof(struct list)); p->fld_t = t; p->fld_n = n; if (t->fld_l==NULL && t->fld_r==NULL) { if (i==NULL) { } } t = NULL; } else { list *tmp = i->n; t = i->fld_t; free(l); i = tmp; } } else if (t->r==NULL) { t = t->fld_l; } else if (t->l==NULL) { t = t->fld_r; } else { n = i; i = malloc( sizeof(struct list)); i->fld_n = n; x = t->fld_r; i->fld_t = x; t = t->fld_l; }

What this program does t 10 12 14 3 r p i 1 Nil 18 2 16 4 20 5 6

What this program does t 10 12 14 3 r p i 1 10 Nil 18 2 16 4 20 5 6 Nil

What this program does 10 t 12 14 3 r p i 1 10 18 Nil 18 2 16 4 20 5 6

What this program does 10 12 t 14 3 r p i 1 12 18 5 10 Nil 6 Nil 18 2 16 4 20

What this program does 10 12 t 14 3 r p i 1 12 16 5 10 18 6 Nil 18 2 16 4 20

What this program does 10 12 t 14 3 r p i 1 14 16 5 12 18 6 10 Nil 18 2 16 4 20 Nil

What this program does 10 12 14 3 r p i 1 14 18 5 12 Nil 6 10 18 2 16 4 t 20 Nil

What this program does 10 12 14 3 r p i 1 16 18 5 14 Nil 6 12 18 2 16 4 t 20 10 Nil

What this program does r 10 12 14 3 t 1 18 2 16 4 20 p i 16 Nil 5 14 6 12 10 Nil

What this program does r 10 12 14 3 t 1 18 2 16 4 20 p i 18 Nil 5 16 6 14 12 10 Nil

What this program does 10 12 14 3 r p i 1 18 Nil 18 2 16 4 20 5 16 6 14 t 12 10 Nil

What this program does 10 12 14 3 r p i 1 20 Nil 18 2 16 4 20 5 18 6 16 t 14 12 10 Nil

Invariants to be formally proven • The program maintains two well formed linked lists, the heads of which are pointed to by i and p. – By well formed we mean that memory on the heap is properly allocated for the lists and there are no loops in the data structures. The program maintains a well formed tree pointed to by r. t always points to an element in the tree rooted at r. The two lists and the tree do not share any nodes. Other than the memory used for the two lists and the tree, no other heap memory is allocated. • The fld_t field of every element in both list structures points to an element in the tree. • •

Coq Goal {? } WHILE not (T == 0) DO N : = P; NEW P, 2; . . . {? }

Program state Environment R=10 I=30 P=40 T=10 e = { R → 10, I → 20, P → 30, T → 10 } Heap h = {10 → 12, 11 → 18, 12 → 14, 13 → 16, 14 → 0, 15 → 0, 16 → 0, 17 → 0, 18 → 20, 19 → 0, 20 → 0, 21 → 0, …}

Program state Environment R=10 I=30 P=40 T=10 e = { R → 10, I → 20, P → 30, T → 10 } Heap v 0=[10, [12 , [14, [0]], [16, [0]]], [18, [20, [0]], [0]]] struct tree { struct tree *left; struct tree * right; } ∃v 0. (e, h) ⊨ TREE(R, v 0, 2, [0, 1]) h = {10 → 12, 11 → 18, 12 → 14, 13 → 16, 14 → 0, 15 → 0, 16 → 0, 17 → 0, 18 → 20, 19 → 0, 20 → 0, 21 → 0} X

Separation logic h={10 → 12, 11 → 18, 12 → 14, 13 → 16, 14 → 0, 15 → 0, 16 → 0, 17 → 0, 18 → 20, 19→ 0, 20 → 0, 21 → 0, 30 → 32, 31 → 10, 32 → 0, 33 → 12, 40 → 42, 41 → 14, 42 → 44, 43 → 12, 44 → 0, 45 → 10} (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R, v 0, 2, [0, 1]) * TREE(I, v 1, 2[0]) * TREE(P, v 2, 2, [0])

Separation logic (e, h) ⊨ s 1 ∗ s 2 if and only if ∃h′ , h′′. (e, h′) ⊨ s 1 ⋀ (e, h′′) ⊨ s 2 ⋀ dom(h 1)∩dom(h 2)=∅ ⋀ h=h′ ∪ h′′

Data structure relationships (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R, v 0, 2, [0, 1]) * TREE(I, v 1, 2, [0]) * TREE(P, v 2, 2, [0])

Data structure relationships (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R, v 0, 2, [0, 1]) * TREE(I, v 1, 2[0]) * TREE(P, v 2, 2, [0]) * ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0]

Data structure relationships (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R, v 0, 2, [0, 1]) * TREE(I, v 1, 2[0]) * TREE(P, v 2, 2, [0]) * ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0] * ∀ v 3 ∈ Tree. Records(v 2). [nth(find(v 2, v 3), 2) in. Tree v 0]

Data structure relationships T→ (e, h) ⊨ ∃v 0 v 1 v 2 TREE(R, v 0, 2, [0, 1]) * TREE(I, v 1, 2[0]) * TREE(P, v 2, 2, [0]) * ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0] * ∀ v 3 ∈ Tree. Records(v 2). [nth(find(v 2, v 3), 2) in. Tree v 0] * [T = 0∨T in. Tree v 0]

Deep model • Process – Create data structure for predicates – Write semantic interpretation function – Write customized tactics • Advantage: greater flexibility in designing tactics – Tactics can be any function that transforms the data structure – Tactic is proven correct once and used for all verifications

Summary of tactics • Forward propagation • Fold/unfold • Merge – Works by pairing off identical pieces • Simplify • State implications – Also works by pairing off

Results (so far) • Tree traversal – Code size: ~30 lines – Invariant size: ~10 lines – Proof check time: ~5 minutes – Main proof size: ~220 lines – Status: top level complete, lemmas need to be proven • DPLL (A decision procedure for sentential satisfiability) – Code size: ~200 lines – Invariant size: ~52 lines – Status: Proof incomplete

Research contributions • Extension of Coq separation logic reasoning to larger programs with more complex data structures. • Creation of a library of useful predicates, functions and tactics – Deep model allows greater control over the design of tactics • Key challenge: Performance tuning – Tradeoffs between performance and automation • Development of a powerful simplification tactic – Simplification tactic executed after every major proof step – Based on term rewriting (with contextual rewriting) concepts – Automates reasoning about associativity, communtivity and other simple property classes • Design decisions in creating canonical form addressed

Related work

Proposed work for Ph. D • Finish DPLL verification • Prove all underlying theorems • Create improved presentation framework for the environment

Deep model • Consider proving: a+c = a+b+c−b

Deep Model Prove the following: Plus a b = simplify (Plus a (Plus b (Plus (Minus c b)))

Deep Model • Parameterized predicate data types type expr = | Const int | Var id | Fun id × list expr

Verification of initialization {∃v 0. TREE(R, v 0 , 2, [0, 1])} T : = R; I : = 0; P : = 0; {∃v 0. ∃v 1. ∃v 2. TREE(R, v 0 , 2, [0, 1])∗ TREE(I, v 1 , 2, [0])∗ TREE(P, v 2 , 2, [0])∗ ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0]∗ ∀ v 3 ∈ Tree. Records(v 2). [nth(find(v 2, v 3), 2) in. Tree v 0]∗ [T = 0∨T in. Tree v 0]}

Verification of initialization {∃v 0. TREE(R, v 0 , 2, [0, 1])} T : = R; I : = 0; P : = 0; {? 1234}

Verification of initialization {∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R]} I : = 0; P : = 0; {? 1234}

Verification of initialization {∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R] * [I = 0]} P : = 0; {? 1234}

Verification of initialization ∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R] * [I = 0] * [P = 0] -> ? 1234

Verification of initialization ∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R] * [I = 0] * [P = 0] -> ? 1234 = ∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R] * [I = 0] * [P = 0]

Verification of initialization ? 1234 → ∃v 0. ∃v 1. ∃v 2. TREE(R, v 0 , 2, [0, 1])∗ TREE(I, v 1 , 2, [0])∗ TREE(P, v 2 , 2, [0])∗ ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0]∗ ∀ v 3 ∈ Tree. Records(v 2). [nth(find(v 2, v 3), 2) in. Tree v 0]∗ [T = 0 ∨ T in. Tree v 0]

Verification of initialization ∃v 0. TREE(R, v 0 , 2, [0, 1]) * [T=R] * [I = 0] * [P = 0] → ∃v 0. ∃v 1. ∃v 2. TREE(R, v 0 , 2, [0, 1])∗ TREE(I, v 1 , 2, [0])∗ TREE(P, v 2 , 2, [0])∗ ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0]∗ ∀ v 3 ∈ Tree. Records(v 2). [nth(find(v 2, v 3), 2) in. Tree v 0]∗ [T = 0 ∨ T in. Tree v 0]

Unfold example {∃v 0∃v 1∃v 2[Tmp l = 0]∗[l /= 0]∗[tmp r = 0]∗ [Tmp r = 0 ∨ Tmp r ∈ Tree. Records(v 0)]∗ [nth(find(v 0, T)), 2), 0) = (Tmp r)]∗ [nth(find(v 0 , T )), 1), 0) = 0]∗ [T ∈ Tree. Records(v 0)]∗ P + 0 → N ∗ P + 1 → T ∗ [T /= 0]∗ TREE(R, v 0 , 2, [0, 1]) ∗ TREE(I, v 1 , 2, [0]) * TREE(N, v 2 , 2, [0])∗ ∀ v 3 ∈ Tree. Records(v 1). [nth(find(v 1, v 3), 2) in. Tree v 0]∗ ∀ v 3 ∈ Tree. Records(v 2). [nth(find(v 2, v 3), 2) in. Tree v 0]} ∗ T : = ∗(I+1); … {? 1234}

Unfold example ∃v 0 ∃v 1 ∃v 2 ∃v 3 ∃v 4 I + 1 → v 1 ∗ I → nth(v 0, 0) ∗ TREE(nth(v 0, 0), nth([I, v 0, v 1], 1), 2, [0]) [Tmp r = 0 ∨ Tmp r ∈ Tree. Records(v 0)]∗ [nth(find(v 2, T)), 2), 0) = (Tmp r)]∗ [nth(find(v 2 , T )), 1), 0) = 0]∗ [T ∈ Tree. Records(v 2)]∗ P+0→N∗P+1→ T∗[T /= 0]∗ TREE(I, v 1 , 2, [0]) * [I + 1 → v 1 ∗ I → nth(v 0, 0)] ∗ TREE(nth(v 0, 0), nth([I, v 0, v 1], 1), 2, [0]) ∗ TREE(R, v 2 , 2, [0, 1]) ∗ Empty * TREE(N, v 4, 2, [0]) * ∀ v 5 ∈ Tree. Records([I, v 0, v 1]). [nth(find([I, v 0, v 1)]), v 5), 2) in. Tree v 2]∗ ∀ v 5 ∈ Tree. Records(v 4). [nth(find(v 4, v 5), 2) in. Tree v 2]} T : = ∗(I+1); . . . {? 1234}

Unfold example ∃v 0 ∃v 1 ∃v 2 ∃v 3 ∃v 4 ∃v 5 [T = v 2] ∗ I + 1 → v 2 ∗ I → nth(v 1, 0) ∗ TREE(nth(v 1, 0), nth([I, v 1, v 2], 1), 2, [0])∗ [Tmp r = 0 ∨ Tmp r ∈ Tree. Records(v 1)]∗ [nth(find(v 3, T)), 2), 0) = (Tmp r)]∗ [nth(find(v 3 , T )), 1), 0) = 0]∗ [T ∈ Tree. Records(v 3)]∗P + 0 → N ∗ P + 1 → T ∗ [T = 0]∗ TREE(R, v 3 , 2, [0, 1]) ∗ Empty ∗ TREE(N, v 5 , 2, [0])∗ ∀ v 6 ∈ Tree. Records([I, v 1, v 2]). [nth(find([I, v 1, v 2], v 6), 2) in. Tree v 3]∗ ∀ v 6 ∈ Tree. Records(v 5). [nth(find(v 5, v 6), 2) in. Tree v 3]} {? 1234}

DPLL Efficient SAT solving algorithm for CNF expressions such as: (A v ~B v C) ^ (A v ~C v D)

$Data structure #define VAR_COUNT 4 char assignments[VAR_COUNT]; struct clause { struct clause *next; char$

Data structure #define VAR_COUNT 4 char assignments[VAR_COUNT]; struct clause { struct clause *next; char positive_lit[VAR_COUNT]; char negative_lit[VAR_COUNT]; char watch_var[VAR_COUNT]; struct clause *watch_next[VAR_COUNT]; struct clause *watch_prev[VAR_COUNT]; } *clauses; struct clause *watches[VAR_COUNT]; struct assignments_to_do { struct assignments_to_do *next, *prev; int var; char value; int unit_prop; } *assignments_to_do_head, *assignments_to_do_tail; struct assignment_stack { struct assignment_stack *next; int var; char value; char unit_prop; } *stack;

DPLL Data structure diagram

DPLL invariant The first part of the invariant are spacial constructs asserting the two arrays and three dynamic data structures in the heap. ARRAY(root, count, functional_representation) is a spatial predicate for arrays. The functional representation is a list of the elements. Abs. Exists. T v 0. Abs. Exists. T v 1. Abs. Exists v 2. Abs. Exists. T v 3. Abs. Exists. T v 4. TREE(clauses, v 0, sizeof_clause, [next_offset])) * TREE(assignments_to_do_head, v 1, sizeof_assignment_stack, [next_offset]) * TREE(stack, v 2, sizeof_assignment_stack, [next_offset]) * ARRAY(assignments, var_count, v 3) * ARRAY(watches, var_count, v 4) * Next, we add on two assertions that guarantee that both the assignment_stack v 2 and assignment array v 3 are consistent. We use (a, b)--->c as an abbreviation for nth(find(a, b), c). … (Abs. All v 5 in Tree. Records(v 2). ([nth(v 3, (v 2, v 5)-->stack_var_offset)==(v 2, v 5)-->stack_val_offset])) * (Abs. All v 5 in range(0, var_count-1). ([nth(v 3, v 5)==0] */* Abs. Exists v 6 in (Tree. Records(v 2)). ([((v 2, v 6)-->stack_var_offset==v 5 / (v 2, v 6)-->stack_val_offset==nth(v 3, v 5))]) )) *