Doctoral Dissertation Defense Formalizing Memory Consistency Models for
- Slides: 38
Doctoral Dissertation Defense Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031. 001. 1
Motivation Memory architectures - more aggressive Load/store Load-acquire/store-release Write atomicity Data dependence Memory fence Semaphore Multithreaded software – popular, BUT hard to analyze - Thread libraries: e. g. , P-thread, Win 32, Solaris - Language level support of threads: e. g. , Java Central Problem – shared memory consistency models - Need a clear specification of memory ordering rules - Need an executable version of memory ordering rules - Need a method to analyze thread executions against the rules 2
What Is a Memory Model? It defines the legal orderings of memory operations that can be perceived at the user level Example (Itanium assembly code, initially: a = b = 0) 0 is OK st a, 1; st b, 1; ld r 1, b; <1> ld r 2, a; <0> CPU memory CPU Can’t observe 0 st a, 1 ; st. rel b, 1; CPU ld. acq r 1, b; <1> ld r 2, a; <1> CPU memory store/load less restriction store-release/load-acquire more restriction 3
Classical Memory Models Sequential Consistency (SC) Non-operational View: 1. 2. 3. Operational View: Common total order Program order Read sees the “latest” write memory They execute as if connected to a single memory through a non-deterministic switch Other Weaker Models: Parallel Random Access Memory (PRAM), Coherence, Causal Consistency, Processor Consistency, Release Consistency, Lazy Release Consistency, Location Consistency, and more … 4
Industrial Memory Models Example: The Intel Itanium® Memory Model • Intel application note contains more than 30 pages of semi-formal rules • English + large amount of special notations • Many non-obvious consequences • Use litmus tests to illustrate properties • Cannot automatically execute litmus tests • Use pencil-and-paper reasoning 5
Language Level Memory Models Example: The Java Memory Model (JMM) • Original JMM: Chapter 17 of Java Language Specification • Poorly understood • Flawed - too weak (may introduce security hole) - too strong (prevents common optimizations) • Currently under revision (JSR-133) - Extensive discussions for more than 3 years - Several replacement proposals - Issues still remain 6
Why Does a Memory Model Matter? Example: Peterson’s Algorithm for Mutual Exclusion Initially, flag 1 = flag 2 = false, turn = 0. Thread 1 Thread 2 flag 1 = true; flag 2 = true; turn = 2; turn = 1; while (turn == 2 && flag 2) while (turn == 1 && flag 1) ; ; <critical section> flag 1 = false; flag 2 = false; Can both threads enter the critical section simultaneously? • For sequential consistency: No (the “intended behavior” is guaranteed) • For many weaker models: Yes (the algorithm would be broken) 7
Do Programmers Really Care? Another example: Double-Checked Locking for Singleton creation class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; } Only use locking as needed “Double-check” the reference } 8
Broken Under the Current JMM class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; } } Only use locking as needed “Double-check” the reference Problem: Broken under the JMM! - on weak architectures - with race conditions - reference can be “visible” before constructor completes Can’t guarantee Helper is fully constructed! 9
Problems with Previous Approaches Virtually for all industrial weak memory models • They don’t have formal specifications For those that do have a formal spec on paper • They can’t be executed For those that have a machine-readable formal spec • They use a “state machine” approach that - employ architecture-specific data structures - cannot be decomposed into orthogonal components - have not been verified against higher level rules No support for verifying “programmer expectations” in multithreaded software 10
Analysis of Multithreaded Software More precise More Scalable Intra-thread Intra-procedural Inter-thread Inter-procedural Memory-model insensitive Memory-model sensitive My thesis work 11
Contributions UMM Operational Specification Method Operational style framework - Axiomatic Specification Method Non-Operational style framework - Constraint Solving Method Concurrency Analysis Applications: Java Memory Model, Classical memory models Nemos Applications: Intel Itanium Memory Model, Classical memory models Prototype tools based on various solvers: CLP, SAT, QBF Incremental SAT solving; Different encoding Language level memory model issues Applications: Execution validation Race detection Atomicity verification 12
Operational Approach: UMM Uniform Memory Model 1. Supports formal verification · Integrates a model checker (Murphi) · Inspired by Park & Dill’s work on Sparc 2. Employs a generic memory abstraction · To eliminate architecture-specific complexities · Uniform notation · A parameterized method 13
UMM Abstract Machine Threadi Threadj LIBi LIBj - Only two layers LIB – Local Instruction Buffer GIB – Global Instruction Buffer GIB - GIB can grow as needed Key insight: make it easy to configure program order and visibility order 14
General Strategy in UMM Enabling mechanism - Program order may be relaxed to enable - certain interleaving - Controlled via bypassing table Filtering mechanism - Visibility order constructed from GIB following - proper ordering requirements - Enforced in read selection rules 15
UMM Example: Sequential Consistency Transition Table Event read write Condition i LIBt(i) : ready(i) op(i) = Read ( w GIB: legal. Write(i, w)) i LIBt(i) : ready(i) op(i) = Write Program order Visibility order Action i. data : = data(w); LIBt(i) : = delete(LIBt(i), i); GIB : = append(GIB, i); LIBt(i) : = delete(LIBt(i), i); ready(i) j LIBt(i): pc(j) < pc(i) BYPASS[op(j)][op(i)] = No legal. Write(r, w) op(w) = Write var(w) = var(r) ( w’ GIB : op(w’) = Write var(w’) = var(r) time(r) > time(w’) > time(w)) 16
Non-Operational Approach: Nemos (Non-operational yet Executable Memory Ordering Specifications) Desired Features Easy to understand, flexible Precise Compositional, modular Executable Solutions Declarative (axiomatic) Predicate logic “Higher order” logic Make “hidden” rules explicit Key insights (1) Make the rules higher order - pass down the order relation through all the rules - Compositional, reusable, scalable, easy to compare (2) Make all rules explicit - Executable using a constraint-programming system 17
Nemos Example: Sequential Consistency (ops is the execution; order is the ordering relation) Formal Definition of SC legal ops order require. Program. Order ops order - Program order require. Read. Value ops order - Read sees “latest” write require. Weak. Total. Order ops odder require. Transitive. Order ops order - Common total order require. Asymmetric. Order ops order require. Program. Order ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j require. Transitive. Order ops order i, j, k ops. (order i j order j k) order i k order is repeatedly refined Hidden rules are explicit 18
The Itanium Memory Ordering Rules legal ops order require. Linear. Order ops order require. Write. Operation. Order ops order require. PO ops odder require. Memory. Data. Dependence ops order require. Data. Flow. Dependence ops order require. Coherence ops order require. Read. Value ops order require. Atomic. WBRelease ops order require. No. UCBypass ops order 19
Specification Hierarchy for Itanium – require. Linear. Order • • Irreflexive Transitive Total Asymmetric – require. Write. Operation. Order • Local/Remote case • Remote/Remote case – require. Program. Order • Acquire Rule • Release Rule • Fence Rule – require. Memory. Data. Dependence • MD: RAW • MD: WAR • MD: WAW – require. Data. Flow. Dependence • DF: RAW • DF: WAR • DF: WAW – require. Coherence • Local/Local case • Remote/Remote case – require. Read. Value • Valid. Wr • Valid. Local. Wr • Valid. Remote. Wr • Valid. Default. Wr • Valid. Rd – require. Automic. WBRelease – require. Sequential. UC –RAR Rule –RAW Rule –WAR Rule –WAW Rule – require. No. UCBypasss 20
How to Make an Axiomatic Specification Executable? Test Program Memory Model Specification Constraints SAT Solver CLP SAT QBF Execution Validation: UNSAT validate. Execution ops order. legal ops order - Effective for revealing critical properties - Effective for verifying common programming patterns 21
Using Constraint Logic Programming (CLP) • Implementation in FD-Prolog is straightforward • Universal quantification handled via enumeration • Existential quantification handled via backtracking • Built-in constraint solver from FD-Prolog: - logical variables - Finite-domain (FD) variables 22
How to Encode the Ordering Relation? nn Encoding: Precedence matrix M j i x x x x x x x x x Values of entry Mij: 1: i is ordered before j 0: i is not ordered before j x: value not bound yet The Method: Given a test program with N operations, use a 2 D precedence matrix with N 2 constraint variables Interpret the symbolic execution, impose constraints to the 2 D matrix When interpretation finishes, x values reveal latitude in weak order When an x changes to a 1, an attempt to set it to 0 later triggers backtracking 23
Example of Prolog Implementation Formal Specification (e. g. , require. Program. Order) require. Program. Order ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j SICStus Prolog Code require. Program. Order(Ops, Order): for_each_elem(Ops, Order, do. Program. Order). elem_prog(do. Program. Order, Ops, Order, I, J): nth(I, Ops, Oi), nth(J, Ops, Oj), p(Oi, P_i), p(Oj, P_j), pc(Oi, PC_i), pc(Oj, PC_j), length(Ops, N), matrix_elem(Order, N, I, J, Oij), (T_i #= T_j #/ PC_i #< PC_j) #/ T_i #= 0 #/ T_j #= 0) #=> Oij. 24
Interactive and Incremental Analysis Itanium Test Program Execution (ops) P 1 Initially, a = b = 0. P 1 st a, 1; st b, 1; (1) st_local(a, 1); (2) st_remote 1(a, 1); (3) st_remote 2(a, 1); (4) st_local(b, 1); (5) st_remote 1(b, 1); (6) st_remote 2(b, 1); P 2 ld r 1, b; <1> ld r 2, a; <0> Can r 1 = 1 and r 2 = 0? Result: legal Interleaving: 8 4 5 6 7 1 2 3 P 2 1 2 3 4 5 6 7 8 0 0 0 x x x x x 1 1 0 x x 1 x x x 0 0 x x 1 0 0 0 x x x x 1 1 1 0 x x x x 0 Order satisfying all constraints (7) ld(1, b); (8) ld(0, a); 1 2 3 4 5 6 7 8 0 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 An instantiated Order 25
The SAT/QBF Approach Initially, we “retro-fit” our Prolog version with SATgenerating code - Showed speed improvement in constraint solving, BUT … - Still slow in CNF generation - Very difficult to debug So we re-engineered our tool: (Done by Prof. Ganesh Gopalakrishnan) - “Stamping out” a finite execution as a QBF formula - “Stamping out” a finite execution as a CNF formula - Experimenting different encoding method: nn vs. nlogn - Check pointing SAT generation 26
Gist of Results 1. SAT seems to be better than QBF 2. The nn encoding method is better than nlogn - despite using more bits - many unit clauses, good for SAT solving 2. Check pointing method does pay-off up to 64 tuples 3. We can easily handle 128 operations 4. Latest result: completed Intel-provided test run 5. (experiment done by Hemanthkumar Sivaraj) 6. 7. 8. 9. - test contains 500 Itanium memory operations - had to suppress the total-order constraint, UNSAT - takes 10 sec to generate SAT instance; 0. 1 sec to solve - still lots of room for improvement 27
How to Verify Programmer Expectations? Program semantics + Memory model semantics Test Program Constraints SAT Solver Program properties e. g. , race / atomicity UNSAT (1) Define both intra-thread and inter-thread semantics as constraints (2) Model correctness properties as additional constraints (3) Reduce a verification problem to a constraint satisfaction problem and solve it automatically 28
Race Detection What’s a data-race? Informally: conflicting and concurrent accesses Is this program race-free? Initially, a = b = 0. Thread 1 Thread 2 r 1 = a; r 2 = b; if (r 1 > 0) b = 1; if (r 2 > 0) a = 1; Are these two instructions conflicting and concurrent? • Control flow interwoven with memory consistency requirements • Hence, the question depends on the memory model - Under SC, this program is race-free - Under a weaker model, this program might contain races 29
Constraints for Control Flow • Treat control operations similar to memory operations –Imagine “assigns” and “uses” of “control variables” • Add an auxiliary control variable ck for each branch statement k, and convert the if-statement to an auxiliary assign of ck –E. g. if(r 1>0) becomes c 1=r 1>0 • Every op k has a path predicate ctr. Expr –K is a use of those control variables in ctr. Expr • k is feasible if ctr. Expr evaluates to ture • Feasibility of ops are checked when setting the rules 30
Data and Control Dependence Data/control flow can be treated similar to global read value rule, i. e. , a read should see the “latest” write Global Reads: for all r = x, exists a x = … Local Reads: for all x = r, exists a r = … Control Reads: for all op that depends on c, exists a c = … require. Read. Value ops order global. Read. Value ops order local. Read. Value ops order control. Read. Value ops order 31
How to Formalize Data-Race? detect. Data. Race ops sc. Order, hb. Order. legal. SC ops sc. Order require. Hb. Order ops hb. Order map. Constraints ops hb. Order sc. Order exist. Data. Race ops hb. Order require. Hb. Order ops hb. Order require. Program. Order ops hb. Order require. Sync. Order ops hb. Order require. Transitive. Order ops hb. Order exist. Data. Race ops hb. Order i, j ops. conflicting. Access i j ¬ (hb. Order i j) ¬ (hb. Order j i) 32
Atomicity Verification What’s Atomicity? · Informally: a block of code executed atomically · Neither a necessary nor a sufficient condition for race-freedom Our approach: · Annotate the atomic block with Atomic. Enter and Atomic. Exit · Verify it automatically · Our definition is generic, can be fine-tuned 33
Constraints for Atomicity verify. Atomicity ops order. legal. SC ops order exists. Atomicity. Violation ops order i, j, k ops. matched. Atomic. Pair i j (t k t i) ¬ (order k i) ¬ (order j k) 34
Conclusion My thesis addressed the following issues - How to make memory ordering rules clear and executable? - How to analyze thread executions against these rules? Our methods have been shown to be practical - A wide range of academic memory models as well as real-world models (Itanium, JMM) - Validation of test cases far exceeded others’ both in speed and scale - Being applied for post-silicon verification in industry Many “customers” can benefit from our methods - Software developers, compiler writers, system designers 35
Publications Operational Specification Method Axiomatic Specification Method • Analyzing the CRF Java Memory Model (APSEC’ 01) • Specifying Java Thread Semantics Using a Uniform Memory Model (JGI’ 02) • UMM: An Operational Memory Model Specification Framework with Integrated Model Checking Capability (CCPE) • Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT(CHARME’ 03) • Nemos: A Framework for Axiomatic and Executable Specifications of Memory Consistency Models (IPDPS’ 04) • A Constraint-Based Approach for Specifying Memory Consistency Models (sent to TPLP) Constraint Solving Method Concurrency Analysis • QB or not QB: An Efficient Execution Verification Tool for Memory Orderings (sent to CAV) • Rigorous Concurrency Analysis of Multithreaded Programs (sent to ISSTA) 36
Continuing Research Opportunities · Scale-up our approach even further - Give up certain precision - Compositional methods - Create assertion language to help abstraction · Improve solving algorithms - Exploit the structural information · “Memory-model-sensitive” compilers - Code synthesis, optimization · Other application domains - Security, embedded systems 37
Thank You ! The dissertation is available at http: //www. cs. utah. edu/~yyang/papers/thesis. pdf The prototype tools are available at http: //www. cs. utah. edu/~yyang/research. html 38
- Nsf dissertation improvement grant
- Doctoral dissertation research improvement grants
- Intel processor
- Shared memory consistency models a tutorial
- Bitcoin
- Monotonic reads
- Client-centric consistency models
- Nonspecific defense mechanisms
- Processor consistency model
- Formalizing relations and functions
- Lesson 4-6 formalizing relations and functions answers
- Avoiding discrimination through causal reasoning
- Doctoral initiative on minority attrition and completion
- Umbc doctoral programs
- Csu doctoral incentive program
- College doctoral ubfc
- Eui doctoral programme
- Power point tesis doctoral medicina
- All but dissertation (abd) status
- South west doctoral training partnership
- Modal and semi modals
- Virtual memory in memory hierarchy consists of
- Excplicit memory
- Logical and physical address in os
- Long term memory vs short term memory
- Which memory is the actual working memory?
- Eidetic memory vs iconic memory
- Internal memory and external memory
- Shared memory vs distributed memory
- Virtual memory and cache memory
- Episodic memory
- Primary memory and secondary memory
- Kontinuitetshantering
- Orubbliga rättigheter
- Ministerstyre för och nackdelar
- Bamse för de yngsta
- Vem räknas som jude
- Sju principer för tillitsbaserad styrning
- Nyckelkompetenser för livslångt lärande