Analyzing the Intel Itanium Memory Ordering Rules using

  • Slides: 29
Download presentation
Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang

Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of Computing University of Utah Work supported in part by NSF Awards CCR-0081406 and 0219805, and SRC Contract 1031. 001

What are Memory Ordering Rules? The effects of aggressive hardware optimizations… Aggressive load/store reorderings

What are Memory Ordering Rules? The effects of aggressive hardware optimizations… Aggressive load/store reorderings ‘Bypassing’ (read back own store before others) cpu …. Strong orderings only at acquires/releases cpu mem . . . that are visible as out-of-order executions to a programmer st a, 1 ; st b, 2; cpu ld b, 2; ld a, 0; st c, 1 ; st. rel d, 2; cpu …. cpu ld. acq d, 2; ld c, 1; “out of order” usually means “with respect to SC” cpu mem 2

Why Relaxed Ordering Rules? • All modern high-end processors employ relaxed ordering rules •

Why Relaxed Ordering Rules? • All modern high-end processors employ relaxed ordering rules • Modern multi-threaded languages also follow suit WHY? • Aggressive updates are too expensive – CPU / Memory speed mismatch getting progressively worse • Enables performance enhancing optimizations at the bus / interconnect level • Simplifies directory protocols (less waiting, avoid deadlocks by relaxing message traffic rules, . . . ) 3

Contrast between `strict’ and `relaxed’ orderings Strict (e. g. , Sequential Consistency) Relaxed (e.

Contrast between `strict’ and `relaxed’ orderings Strict (e. g. , Sequential Consistency) Relaxed (e. g. , PRAM) Each processor’s instructions come according to program order memory They execute as if connected to a single serial memory thru a non-deterministic switch One memory per processor in effect (details omitted) No write-atomicity - only program order obeyed 4

Contrast between Relaxed Academic and Industrial Models Relaxed (e. g. , PRAM) Relaxed +

Contrast between Relaxed Academic and Industrial Models Relaxed (e. g. , PRAM) Relaxed + Strict + Hybrid +. . . (e. g. , Itanium) • See our ICCD’ 99 paper for a very approximate operational model • Lamport et. al. have one in TLA, too. . . 5

Who depends on Memory Orderings? • Compiler / OS developers – many of the

Who depends on Memory Orderings? • Compiler / OS developers – many of the proposed high-performance kernels exploit weakness to a high degree • People who port existing code-bases – code-bases must port between platforms • Implementers of thread-based systems, JVMs, . . – it has to mesh with the language-level memory model as well • It is a central issue even in “uniprocessors” in which multiple threads share memory 6

A taxonomy of methods to specify industrial Relaxed Memory Models • Informal • •

A taxonomy of methods to specify industrial Relaxed Memory Models • Informal • • “A Store Release flushes out earlier pended operations. All Store Releases appear to commit in a global total order. They allow Read Bypassing, except for non-Cacheable addresses Full Intel spec available by searching `251429’ under google – A dozen or so litmus tests also given as a supplement P 1 • Formal st. rel ld. acq ld A, 1; r 1, A; [1] r 2, B; [0] st. rel ld. acq ld P 2 B, 1; r 3, B; [1] r 4, A; [0] – Operational – Axiomatic 7

A taxonomy of Formal methods to specify industrial Relaxed Memory Models • Operational –

A taxonomy of Formal methods to specify industrial Relaxed Memory Models • Operational – Operational models of industrial memory models are complex – Running them inside a standard model-checker is too slow! – Utility for verification is limited – Provides limited insight • Axiomatic – Much more precise – Orderings must ideally be expressed thru an ORTHOGONAL set of rules – No such prior axiomatic specs of industrial memory models 8

How to Organize Axiomatic Memory Ordering Specs? • Ad-hoc • Visibility Order Based 9

How to Organize Axiomatic Memory Ordering Specs? • Ad-hoc • Visibility Order Based 9

Visibility Order Specs A memory model (spec of Memory Ordering Rules) is a mapping

Visibility Order Specs A memory model (spec of Memory Ordering Rules) is a mapping from executions to a set of allowed total orders called visibility orders; it is a 1 -to-many mapping: st A, 1 ; ld B [v 1] st B, 2; ld A [v 2] st(A, 1) st(B, 2) ld(A, v 2) ld(B, v 1) st(B, 2) st(A, 1) For “complex” instructions, we generate more visibility events st. rel A, 1 ; st B, 2; ld(B, v 1) ld(A, v 2) ld. acq B [v 1] ld A [v 2] { seen in P 1 Strict Ordering Allowed Relaxed Ordering allowed too seen in P 2 st. rel(A, 1), ld. acq(B, v 1), st(B, 2), ld(A, v 2) } After specifying allowed Visibility Orders, the Load-Value Rule specifies how Loads return their values. . . see below initial ld(A, ? ) st(A, 1) memory 0 st(A, 1) st(B, 2); ld(B, ? ) 2 10

Our first contribution • Developed Axiomatic, Visibility Order based Spec for most of Itanium

Our first contribution • Developed Axiomatic, Visibility Order based Spec for most of Itanium Orderings (semaphores will be added in next version) – Orderings implicit in their document made explicit • 3 -pages of HOL as opposed to 24 pages of prose + tables – Also developed an executable constraint-Prolog version • Can reason using a theorem prover – will attempt claim found in Intel’s manual about causality • Written in a generic style - several other memory models specified in the same framework – pre-requisite to formally comparing memory models • Comprised of orthogonal sub-rules 11

legal. Itanium Style of specification legal. Itanium(ops) = Visibility Order described by order :

legal. Itanium Style of specification legal. Itanium(ops) = Visibility Order described by order : visevent -> bool We use the “id” of each visevent which is an int; so order : int -> bool Exists order. ( constraint 1 / constraint 2 /. . . ) ops order • Can selectively disable constraints and compare results • Since the constraints are orthogonal, we can localize errors 12

legal. Itanium(ops) = Exists order. ( require. Linear. Order / require. Write. Operation. Order

legal. Itanium(ops) = Exists order. ( require. Linear. Order / require. Write. Operation. Order / require. Program. Order / require. Memory. Data. Dependence / require. Data. Flow. Dependence / require. Coherence / require. Read. Value / require. Atomic. WBRelease / require. Sequential. UC / require. No. UCBypass ops order ops order ops order ) 13

require. Program. Order ops order = Forall i, j : ops ( ordered. By.

require. Program. Order ops order = Forall i, j : ops ( ordered. By. Acquire i j / ordered. By. Release i j / ordered. By. Fence i j ) ==> order i j 14

Where do we use our Formal Spec of Memory Orderings? • To help solve

Where do we use our Formal Spec of Memory Orderings? • To help solve one of the nastiest problems encountered during Post-Silicon Validation – An MP system has just been built (boards, fan, . . . ) – How do we certify that it obeys the memory ordering rules? WHY IS POST-SILICON VERIFICATION HARD? Unverified inter-module assumptions examined for the first time at GHz speeds! Limited observability (forced to observe via “final effects” on programs) 15

Typical Post-Si Memory Ordering Verification Approach • Manual reasoning of executions generated by random

Typical Post-Si Memory Ordering Verification Approach • Manual reasoning of executions generated by random tests – Highly labor intensive • designers have to think through ALL ordering rules at EACH step – No systematic methods to write the tests • Ad-hoc tools employed for behavior matching – No Formal Guarantees even on small executions – No insights provided upon failure – Cannot pinpoint onset of divergence from allowed behaviors 16

Our Idealized Approach to a solution (currently under development) An Arbitrary Specification of Memory

Our Idealized Approach to a solution (currently under development) An Arbitrary Specification of Memory Ordering Rules in HOL ILLEGAL! explanation script. . . BUILD THIS BOX !! An Arbitrary Litmus Test, e. g. . LEGAL! Explanation script + ALL bindings to V 2 and V 3 st. rel a, 1; st. rel b, 1; ld. acq r 1, a; [V 2] ld. acq r 3, b; [V 3] ld r 2, b; [0] ld r 4, a; [0] 17

The first approach presented here ILLEGAL! Spec of Memory Ordering Rules Coded-up Nicely as

The first approach presented here ILLEGAL! Spec of Memory Ordering Rules Coded-up Nicely as a Constraint Logic Program An Arbitrary Ground Litmus Test, e. g. . LEGAL! explanation script. . . st. rel a, 1; st. rel b, 1; ld. acq r 1, a; [1] ld. acq r 3, b; [1] ld r 2, b; [0] ld r 4, a; [0] only ground values allowed 18

The second approach presented here Spec of Memory Ordering Rules Coded-up Nicely as a

The second approach presented here Spec of Memory Ordering Rules Coded-up Nicely as a Constraint Logic Program An Arbitrary Ground Litmus Test UNSAT! implies ILLEGAL! A SAT checker SAT! implies LEGAL! 19

How does Approach #1 work ? • Need to know a little bit about

How does Approach #1 work ? • Need to know a little bit about Constraint Logic Programs (e. g. , – Gnu. Prolog, Sicstus Prolog, Mozart, . . . support constraints directly – Available as “free-standing” packages callable from C, Java, Ocaml, . . . called with Y = W, X unbound evens_below_Y( X, Y) : - X is in (0. . 10), X < Y, (X mod 2) = 0 Allocates constraint-store entry for X with some user-chosen initial range Imposes X=W-1 backtracking triggered if W is later found = 6 Imposes constraint (W-1) mod 2 = 0 into constraint store 20

How to model require. Program. Order as a Constraint Logic Program? (e. g. )

How to model require. Program. Order as a Constraint Logic Program? (e. g. ) require. Program. Order ops order = Forall i, j : ops ( ordered. By. Acquire i j / ordered. By. Release i j / ordered. By. Fence i j ) ==> order i j i • Allocate 2 D constraint-var array • Interpret Litmus test, adding constraint to 2 D array • When Interpretation Finishes, all “x” reveals latitude in weak order • When an “x” changes to a 1, an attempt to set it 0 later triggers backtracking j x x x x x x x x x = 1 means i is ordered before j 21

Our Prolog Code is VERY close to the HOL spec! require. Program. Order ops

Our Prolog Code is VERY close to the HOL spec! require. Program. Order ops order = Forall i, j : ops ( ordered. By. Acquire i j / ordered. By. Release i j / ordered. By. Fence i j ) ==> order i j 22

Our Prolog Code is VERY close to the HOL spec! require. Program. Order ops

Our Prolog Code is VERY close to the HOL spec! require. Program. Order ops order = Forall i, j : ops ( ordered. By. Acquire i j / ordered. By. Release i j / ordered. By. Fence i j ) ==> order i j ( % Rule (ACQ): ACQ>>I. . . #/ % Rule (REL): Op_j #= St. Rel #/ ( Is. Wr_i #==> (Wr. Type_i #= Local #/ Wr. Type_j #= Local #/ Wr. Type_i #= Remote #/ Wr. Type_j #= Remote #/ Wr. Proc_i #= Wr. Proc_j) ) #==> Oij. . . IMPOSES CONSTRAINT ON MATRIX ENTRY Oij 23

Idea behind the SAT approach ( % Rule (ACQ): ACQ>>I. . . #/ %

Idea behind the SAT approach ( % Rule (ACQ): ACQ>>I. . . #/ % Rule (REL): Op_j #= St. Rel #/ ( Is. Wr_i #==> (Wr. Type_i #= Local #/ Wr. Type_j #= Local #/ Wr. Type_i #= Remote #/ Wr. Type_j #= Remote #/ Wr. Proc_i #= Wr. Proc_j) ) #==> . . Emit Boolean Expression here (as opposed to imposing constraint on constraint-store) 24

What did we learn? • A really elegant approach to study Memory Ordering •

What did we learn? • A really elegant approach to study Memory Ordering • Many bugs in spec caught through finite executions – Formal `paper-and-pencil’ memory ordering specs are very unreliable! • Prolog Code may not scale – Prolog Quirks (memory resources scattered in stack, trail-stack, constraint-store, . . . - execution halts if one exhausted) – Prolog’s search may not be “as smart” as SAT’s (? ) • SAT generation time dominates – Pretty naive coding and CNF generation – Could scale considerably; for example: FD-solving 22 s SAT-gen 200 s SAT-vars SAT-clauses SAT-solving 576 15 k 0. 01 s • Best long-term approach is the `ideal’ one mentioned earlier – (explain details if there is time) 25

Summary of Key Contributions • We provide a formal specification of the entire Itanium

Summary of Key Contributions • We provide a formal specification of the entire Itanium memory ordering specification in Higher Order Logic (barring semaphores that change the ‘data structures’ we need ) – Our Spec (3 pages of hol) replaces 24 pages of Intel spec – Our Spec is EASIER to understand (said the Charme reviewers!) – We can now prove theorems to increase confidence • We present TWO ways to use this hol spec to check executions obtained from the post-silicon environment – Encode as a Constraint-Logic program that interprets assembly executions and checks conformance with the rules – Constraint-Logic program that interprets assembly executions, and generates a SAT instance embodying conformance • Our tool was given to engineers in Intel’s post-Si validation group – highly encouraging feedback obtained 26

Some of the Related Work • Classical approaches – Mostly paper-and-pencil specs – Executable

Some of the Related Work • Classical approaches – Mostly paper-and-pencil specs – Executable specs (Murphi) used to verify critical section codes • Spec of the Alpha memory ordering rules in FOL/HOL – Yuan Yu (personal communication) - unpublished – VCs generated for assembly programs and given to ESC prover – Our work is for a modern system (Itanium) and uses SAT • TLA+ spec of the Itanium ordering rules – Details are not published – Not amenable to execution (very slow execution speeds) – Impractical for use in checking assembly program executions 27

Questions? 28

Questions? 28

Work in progress An Arbitrary Specification of Memory Ordering Rules in HOL Generate a

Work in progress An Arbitrary Specification of Memory Ordering Rules in HOL Generate a QBF formula for the size of the Litmus test QBF Solver Generate “compact” CNF An Arbitrary Litmus Test (non-ground values allowed) DNF representation of Litmus test (“ROM”) ILLEGAL! explanation script. . . LEGAL! Explanation script + ALL bindings to V 2 and V 3 QBF is natural for memory ordering rules 29