Effective Program Verification for Relaxed Memory Models Sebastian

Effective Program Verification for Relaxed Memory Models Sebastian Burckhardt Madanlal Musuvathi Microsoft Research CAV, July 10, 2008

Motivation: Memory Model Vulnerabilities � Programmers do not always follow strict locking discipline in performance-critical code ◦ Ad-hoc synchronization with normal loads and stores or interlocked operations is faster ◦ Result: “benign” or “intentional” data races � Such code can break on relaxed memory models ◦ Most multicore machines are not sequentially consistent ◦ Both compilers and actual hardware can contribute to effect � Vulnerabilities are hard to find, reproduce, and analyze ◦ May require specific hardware configuration and schedule 2

C# Example volatile bool is. Idling; volatile bool has. Work; //Consumer thread void Block. On. Idle(){ lock (cond. Variable){ is. Idling = true; if (!has. Work) Monitor. Wait(cond. Variable); is. Idling = false; } } //Producer thread void Notify. Potential. Work(){ has. Work = true; if (is. Idling) lock (cond. Variable) { Monitor. Pulse(cond. Variable); } } 3

Example: Store Buffer Vulnerability � Key pieces of code on previous slide: volatile int ii = 0; volatile int hw = 0; Consumer Producer Store ii, 1 Load hw, 0 Store hw, 1 Load ii, 1 0 � On x 86, hardware may perform store late � Bug: Producer thread does notice waiting 4 Consumer, does not send signal

Abstract View of Memory Models Given a program P, a memory model Y defines the subset TP, Y T of traces corresponding to some (partial or complete) execution of P on Y. TP, SC SC (sequential consistency) Is strongest memory model 5 TP, Y T More executions may be possible on a relaxed memory model Y 5

Example: TSO Under TSO, processors can buffer stores in FIFO queue. TP, SC 6 TP, TSO 1. 1 Store ii, 1 2. 1 Store hw, 1 1. 2 Load hw, 0 2. 2 Load ii, 0 T Trace corresponding to code on slide 4 6

Why TSO? � Memory models RMO PSO TSO z 6 SC Alpha IA-32 IA-64 7 are platform dependent & ridden with details � We focus on TSO because it models store buffers, the most common relaxation � In practice, TSO is almost the same as the x 86 hardware model

Model Checking Programs on Relaxed Memory Models � Covering all relaxed executions is challenging ◦ Highly nondeterministic (exposed to low-level hardware concurrency) ◦ Memory models are usually not finite-state ◦ Memory models are often a matter of negotiation (formal descriptions are the exception) � State of the art has limited scalability ◦ Model checking using simplified operational models ◦ Bounded model checking using axiomatic models (Check. Fence) 8

Memory Model Safety Observation: Programmer writes code for SC ◦ Resorts to {locks, fences, volatiles, interlocked operations} to maintain SC behavior where needed ◦ If program P exhibits non-SC behavior, it is most likely a bug Definition: A program P is Y-safe if TP, SC = TP, Y 9

Decomposed Program Verification on Relaxed Memory Models TP, SC TP, Y T 1. Verify sequentially consistent executions (show that all executions in TP, SC are correct) 2. Verify memory model safety (show that TP, SC = TP, Y ) 10 Can we do 1 and 2 at the same time? Yes.

Borderline Executions � Def. : A borderline execution for P is an execution with a successor in TP, TSO - TP, SC TP, TSO � Thm. : A program P is TSO-safe if and only if it has no borderline executions. 11

Borderline Executions � Def. : A borderline execution for P is an execution with a successor in TP, TSO - TP, SC We can verify / falsify this as a safety property of sequentially consistent executions! � Thm. : A program P TP, SC TP, TSO is TSO-safe if and only if it has no borderline executions. 12

Example: TSO Borderline Execution 1. 1 Store ii, 1 2. 1 Store hw, 1 1. 2 Load hw, 0 2. 2 Load ii, 1 TP, SC TP, TSO 1. 1 Store ii, 1 2. 1 Store hw, 1 1. 2 Load hw, 0 2. 2 Load ii, 0 Successor traces are traces with one more instruction. 13

Sober Tool Structure Instrumented Program Scheduler Enumerates Traces Outputs: 14 (1) P correct Event Stream (shared memory accesses, sync ops) Borderline Monitor Stateless Model Checker (CHESS) (2) P not TSO-safe (+cex) (3) P has SC-bug (+cex) Program output is always sound. Tool may not terminate exploration if # of executions is too large.

Define SC using hb relation � Trace = Set of Instructions (Vertices) with attributes ◦ [processor]. [issue index] [operation] [address], [coherence index] coh. index is the position of the value within the sequence of values written to the same location (i. e. , “we replace each value with its sequence number”) Add edges: program order p / conflict order c � Define happens-before order hb = ( p c) � Trace is sequentially consistent if and only if hb is acyclic. � This trace is SC: This trace is not SC: 1. 1 Store ii, 1 1. 2 Load hw, 0 1. 1 Store ii, 1 2. 1 Store hw, 1 2. 2 Load ii, 1 15 1. 2 Load hw, 0 2. 1 Store hw, 1 2. 2 Load ii, 0

Define TSO by Relaxing hb Define relaxed happens-before order rhb = ( p c) { (s, l) | s is store, l is load, and s p l } � Trace is possible on TSO if and only if (1) rhb is acyclic (2) there do not exist s, l such that s p l and l c s � This trace is TSO, but not SC: 1. 1 Store ii, 1 1. 2 Load hw, 0 2. 1 Store hw, 1 Thm. : Def. Is equivalent to operational TSO model (see Tech Report) 2. 2 Load ii, 0 1. 1 Store ii, 1 1. 2 Load hw, 0 16 hb 1. 1 Store ii, 1 2. 1 Store hw, 1 2. 2 Load ii, 0 1. 2 Load hw, 0 rhb 2. 1 Store hw, 1 2. 2 Load ii, 0

Borderline Monitor Implementation Receiving a stream of memory accesses: � Record all stores to all locations. � For each load L, check if there exists a reordering of L with prior stores to the same location such that (1) hb has a cycle (2) rhb is acyclic (3) there do not exist s, l such that s p l and l c s � Implementation: use standard vector clock to compute hb , and custom vector clock (twice the width) to compute rhb 17

Equivalent Interleavings � Typically, many different interleavings map to the same (Mazurkiewic) trace. � By construction, our monitor is insensitive to the choice of interleaving ◦ Checks all hb -equivalent ones simultaneously ◦ Makes it compatible with partial order reduction ◦ Improves probability of finding bugs 18

Results � Good at finding bugs even if only a small number of schedules is explored ◦ Monitor checks all hb-equivalent interleavings ◦ Chess heuristic (iterative context bounding) seems to mix well � Found expected store buffer vulnerabilities in standard examples (Dekker, Bakery) � Detected 2 store buffer vulnerabilities in a production-level concurrency library. ◦ Overall code size ~ 33 kloc ◦ Used existing test harness written by product team (slightly adapted for use with CHESS) ◦ Bugs not previously known 19

Some Numbers program name Fig. 1(b) dekker (2 threads, 2 crit-sec) (loc 82) bakery (2 threads, 3 crit-sec) (loc 122) takequeue (2 threads, 6 ops) (loc 374) 20 context bound ∞ 1 2 3 4 5 0 1 2 3 4 5 # interleavings time ver. time [s] total borderline [s] So. Be. R CHESS 10 4 < 0. 1 < 0. 2 5 4 < 0. 1 < 0. 2 36 23 < 0. 1 0. 39 0. 37 183 50 < 0. 1 1. 9 1. 8 1, 219 124 < 0. 1 13. 2 13. 0 8, 472 349 < 0. 1 106. 0 100. 6 1 1 < 0. 2 25 20 < 0. 1 0. 47 0. 43 742 533 < 0. 1 10. 3 9. 8 12, 436 8, 599 < 0. 1 189. 0 181. 0 3 47 402 2, 318 9, 147 29, 821 0 14 189 1, 197 5, 321 17, 922 n. a. 0. 34 0. 43 0. 74 0. 86 < 0. 3 0. 72 5. 2 28. 9 125. 5 481. 5 < 0. 3 0. 69 4. 9 27. 8 118. 9 461. 6

Conclusion � With increasing use of multicores, more and more programs are likely to exhibit failures caused by the memory model. � Such failures are hard to find by conventional means (code inspection, testing). � Our combination of borderline monitor & stateless model checking makes it practical to detect memory model safety violations in a unit test environment. 21

Future Work � Run on larger programs (runtime verification) � Handle more memory models ◦ Which memory models guarantee borderline executions? � Prove memory model safety of concurrent data type implementations � Develop borderline monitors for other relaxed concurrent APIs ◦ Transactional memory ◦ Concurrency Libraries 22