Butterfly Analysis Adapting Dataflow Analysis to Dynamic Parallel
Butterfly Analysis: Adapting Dataflow Analysis to Dynamic Parallel Monitoring Michelle L. Goodstein*, Evangelos Vlachos*, Shimin Chen†, Phillip B. Gibbons†, Michael A. Kozuch† and Todd C. Mowry* *Carnegie Mellon University Butterfly Analysis †Intel Labs Pittsburgh Michelle Goodstein
Catching Bugs: Case For Dynamic Program Monitoring Motivation: Catch software bugs before they cause serious harm Before Execution Static Analysis: Compilers, formal verification, etc. During Execution After Crash Dynamic Analysis: Post-Mortem Analysis: Lifeguards: DBI (e. g. , PIN, Valgrind), LBA, DISE Cannot statically catch all bugs FDR, De. Lorean, Strata, Bug. Net, etc. Do not want to wait for crash We will focus on dynamic analysis via lifeguards Butterfly Analysis 2 Michelle Goodstein
Time Dynamic Program Monitoring. . ld 0 x 14. . st 0 x 10. . ERROR: Did app 0 x 14 malloc 0 x 14? unallocated Check metadata Lifeguard Metadata: Allocated? 0 x 10 1 0 x 14 0 0 x 18. 0 x 22. Application • Application is dynamically monitored by a lifeguard as it runs – Monitors each dynamic instruction • Lifeguard maintains finite-state machine model of correct execution – Checks metadata to see if program does something wrong • Ex: Has memory location 0 x 14 been allocated? Butterfly Analysis 3 Michelle Goodstein
Time Dynamically Monitoring Parallel Programs p=malloc. . p=NULL. . *p=…. . . Thread 1 Timesliced App Lifeguard 1 Lifeguard 2 Thread 2 • Updating metadata straightforward for sequential programs • Parallel apps: inter-thread data dependences complicate lifeguards • One solution: Timeslice all application threads on one core + State of the art, only requires sequential lifeguard - Slow, serializes application Butterfly Analysis 4 Michelle Goodstein
Time Dynamically Monitoring Parallel Programs p=malloc. . *p=…. . Thread 1 . . . p=NULL. . Lifeguard 1 Lifeguard 2 Thread 2 • Updating metadata straightforward for sequential programs • Parallel apps: inter-thread data dependences complicate lifeguards • Para. Log: Expose inter-thread data-dependences to lifeguard + Parallel lifeguards for parallel applications - Specialized hardware - Requires sequential consistency or total store order Butterfly Analysis 5 Michelle Goodstein
A Counter-Intuitive Proposal • Our approach: explicit windows of uncertainty – Outside the window: ordering is known – Ordering unknown within window è Only have a partial order of application instructions è Analysis is conservative (assumes worst case) Butterfly Analysis 6 Time • Intuition: Lifeguard should process application’s instructions in same order application retires instructions • Counter-intuitive: Proceed without capturing inter-thread data dependences – Cannot measure using today’s hardware – Relaxed memory consistency models: no total order p=malloc. . . *p=…. . . Occurs. strictly. *p before. . . p=NULL. . . Concurrent. Region Michelle Goodstein
Time Handling Uncertainty. p=malloc. . . *p=…. . . Occurs strictly before *p. . . p=NULL Concurrent region. W • Only consider a window W of uncertainty How big is the window? • Must account for buffering in pipeline and memory system • Our experiments: 1000 s-10, 000 s of instructions/thread • Window is large relative to ROB, memory access latency • Window is small relative to total execution Butterfly Analysis 7 Michelle Goodstein
Butterfly Analysis: Bounding Uncertainty Thread 1 Epoch S ≥W Time ≥W . . . Not a barrier Thread 2 Thread 3 p=malloc. . . *p=…. . p=NULL. . . • Concept: dynamically cut across all threads – Divide execution into epochs • Cuts need only be roughly aligned: incorporate “stagger” time S • Can be done in software, using a token ring/fence—no special hardware required! Butterfly Analysis 8 Michelle Goodstein
Epochs: Reasoning About Concurrency W Relative To window Sliding Center Epoch to 3 epochs limited Time W • From the perspective of the blue epoch • Most epochs are non-adjacent – Instructions in these epochs execute strictly before or strictly after • Two epochs are adjacent to blue epoch • 3 epoch window of potentially concurrent instructions Butterfly Analysis 9 Michelle Goodstein
Concurrency Within Three Epoch Window l-1 Thread t Time Epochs l l+1 Prior Current Next Concurrent Butterfly Analysis Concurrent 10 Michelle Goodstein
Anatomy of a Butterfly l-1 Thread t Time Epochs l l+1 Head Body Tail Wings Butterfly Analysis Wings 11 Michelle Goodstein
Butterfly Analysis: Avoiding Potential Pitfalls l-1 Thread t Epochs l Head Body Time l+1 Tail Wings • Combinatorial explosion of potential interleavings – Enumerating all possible interleavings takes too long • Lifeguard writer should not need to worry about application ordering • Inspiration: Interval analysis handles similar problem Butterfly Analysis 12 Michelle Goodstein
Brief Review: Interval Analysis B 1 B 3 B 2 F*(B 3) F(B 3) B 3 a B 4 Want to compute closure F*(B 3) Basic Control Flow Graph Butterfly Analysis 13 Michelle Goodstein
Interval Analysis vs. Butterfly Analysis Thread t B 4 Epochs l Head Body Tail l+1 B 3 Time B 2 l-1 B 1 Wings Interval Analysis: Butterfly Analysis: • Static analysis on Control Flow Graph • Compute closure F*(B 3) • Only enter top/exit bottom of basic block • Specify problem, framework exists to process Butterfly Analysis Wings 14 • • Dynamic analysis on execution trace Compute “closure” over wings Enter/exit anywhere due to concurrency Built our own framework Michelle Goodstein
Butterfly Analysis: Parallel Forward Dataflow Analysis l-1 Thread t Body Tail l+1 Time Epochs l Head Wings • Introduces two new primitives: Side-Out and Side-In – Side-Out: Effects of concurrency a block exposes to other threads – Side-In: Effects of concurrency other threads expose to a block Butterfly Analysis 15 Michelle Goodstein
Butterfly Analysis: Parallel Dataflow Analysis l-1 Thread t Body Tail l+1 Time Epochs l Head Wings • Introduces two new primitives: Side-Out and Side-In – Side-Out: Effects of concurrency a block exposes to other threads – Side-In: Effects of concurrency other threads expose to a block Butterfly Analysis 16 Michelle Goodstein
Lifeguard Creation in Butterfly Analysis • Lifeguard writer specifies: – Events interested in tracking – Metadata format – Checking algorithm – A meet operation (inspired by dataflow analysis) • Lifeguards become 2 -pass algorithms (with respect to a butterfly) • Example: Reaching (Available) Expressions – Computes whether expression is available across all execution paths – Abstraction for computing properties true on all possible interleavings Butterfly Analysis 17 Michelle Goodstein
Simple Example: Reaching Expressions Thread 2 Thread 3 . . t=a-b. . . b=b-1. . y=a-b. . . z=a-b. . Time Thread 1 Butterfly Analysis 18 Michelle Goodstein
Reaching Expressions Example: Butterfly Analysis Initial state: a-b available Thread 2 Thread 3 t=a-b. . . z=a-b. . . y=a-b . . b=b-1. . . . Time Epochs l l+1 l-1 Thread 1 Butterfly Analysis 19 Michelle Goodstein
Reaching Expressions Example: Communication After 1 st Pass Initial state: a-b available Thread 2 Thread 3 t=a-b. . . z=a-b. Time Epochs l l+1 l-1 Thread 1 . . b=b-1. SIDE-OUTl-1, 1 KILL = {} . . y=a-b SIDE-OUTl, 1 KILL = {} . . . SIDE-OUTl+1, 1 = KILL = {a-b} SIDE-OUTl-1, 2 KILL = {} . . . SIDE-OUTl+1, 2 KILL = {} {a-b} SIDE-OUTl-1, 3 KILL = {} SIDE-OUTl+1, 3 KILL = {} After first pass, every block has a Side-Out Butterfly Analysis 20 Michelle Goodstein
Reaching Expressions Example: Communication After 1 st Pass Initial state: a-b available Thread 2 Thread 3 t=a-b. . . z=a-b. Time Epochs l l+1 l-1 Thread 1 . . . SIDE-OUTl-1, 1 KILL = {} . . y=a-b SIDE-OUTl, 1 KILL = {} . b=b-1. . SIDE-OUTl+1, 1 KILL = {a-b} SIDE-OUTl-1, 2 KILL = {} . . . SIDE-OUTl+1, 2 KILL = {} SIDE-OUTl-1, 3 KILL = {} SIDE-OUTl+1, 3 KILL = {} From perspective of body of the butterfly Butterfly Analysis 21 Michelle Goodstein
Reaching Expressions Example: Communication After 1 st Pass Initial state: a-b available Thread 2 Thread 3 t=a-b. . . z=a-b. Epochs l l+1 l-1 Thread 1 . . . Time . b=b-1. SIDE-OUTl-1, 1 KILL = {} . . y=a-b SIDE-OUTl, 1 KILL = {} . . . SIDE-OUTl+1, 1 KILL = {a-b} SIDE-OUTl-1, 2 KILL = {} SIDE-OUTl+1, 2 KILL = {} . . . SIDE-OUTl-1, 3 KILL = {} SIDE-OUTl+1, 3 KILL = {} Instructions in the head and tail do not interleave with the body Butterfly Analysis 22 Michelle Goodstein
Reaching Expressions Example: Communication After 1 st Pass Time Epochs l l+1 l-1 Thread 1 t=a-b. . . b=b-1. Initial state: a-b available Defined in paper; not standard reaching exps meet Thread 2 Thread 3 meet SIDE-INl, 2 SIDE-OUTl-1, 1 KILL = {a-b} KILL = {} . . y=a-b SIDE-OUTl, 1 KILL = {} . z=a-b. . . . SIDE-OUTl+1, 1 KILL = {a-b} SIDE-OUTl-1, 3 KILL = {} SIDE-OUTl+1, 3 KILL = {} Body computes the meet of the Side-Out of the wings to get the Side-In Butterfly Analysis 23 Michelle Goodstein
Reaching Expressions: 2 nd Pass Initial state: a-b available SIDE-INl, 2 KILL = {a-b} . . y=a-b • Initiate 2 nd pass, incorporating Side-In • Side-In shows that a-b was not necessarily globally available Butterfly Analysis 24 Michelle Goodstein
Butterfly Lifeguards • Canonical examples: – Reaching Definitions, Reaching Expressions • Lifeguards: – ADDRCHECK, TAINTCHECK • In all cases, provably guaranteed zero false negatives – Lifeguard never misses true error • Lifeguards may experience false positives due to conservative analysis – Occasionally mistake a safe event for an error • Suitable for relaxed memory consistency models – Require respect of intra-thread data dependences, cache coherency • See paper for details Butterfly Analysis 25 Michelle Goodstein
Butterfly Analysis: ADDRCHECK As Prototype Time • ADDRCHECK, a memory lifeguard – Checks that memory locations are unallocated before a malloc – Checks that memory locations are allocated before free/read/write – Adaptation of Reaching Expressions ERROR: 0 x 14 unallocated . . ld 0 x 14. . st 0 x 10. . Check metadata Lifeguard Metadata: Allocated? 0 x 10 1 0 x 14 0 0 x 18. 0 x 22. Application Butterfly Analysis 26 Michelle Goodstein
Experimental Framework • Prototype built upon the Log-Based Architecture (LBA) framework – Full butterfly analysis stack implemented in software – Simulated hardware on shared-memory CMP using Simics – Used LBA for dynamic instruction traces, inserting epoch boundaries • Measured 3 CMP configurations: 4, 8, 16 cores – Corresponds to 2, 4, 8 application and lifeguard threads • Measured two epoch sizes: 8 K, 64 K instructions/thread Butterfly Analysis 27 Michelle Goodstein
Normalized to sequential, unmonitored 2 app/2 lifeguard threads 4 cores total Butterfly Analysis ADDRCHECK Performance Results 4 app/4 lifeguard threads 8 cores total 28 8 app/8 lifeguard threads 16 cores total Michelle Goodstein
Normalized to sequential, unmonitored Butterfly Analysis ADDRCHECK Performance Results 29 Michelle Goodstein
Normalized to sequential, unmonitored ADDRCHECK Performance Results Butterfly On average: Analysis Butterfly scales performs Analysis well, Analysis but well outperforms small relative greatly slowdown outperforms to Timesliced parallel relative unmonitored, Timesliced to Timesliced Room for improvement and much better relative thanto. Timesliced parallel unmonitored Butterfly Analysis 30 Michelle Goodstein
Sensitivity to Epoch Sizes: Performance and False Positives (insts/thread) 1. 4 E-01 1 0. 75 0. 5 1. 0 E-06 1. 0 E-07 BARNES GEO. MEAN Performance LU OCEAN FMM FFT BARNES 0 BLACKS. 0. 25 1. 0 E-04 Precision Restricting focus to 16 core CMP configuration only (8 app/8 lifeguard threads) Butterfly Analysis 31 Michelle Goodstein GEO. MEAN 1. 25 1. 0 E-03 LU 1. 5 1. 0 E-02 BLACKS. 1. 75 FMM 64 K 8 K FFT False Positives (% of Memory Accesses) 3. 1 2 Normalized Execution Time Logscale Epoch Size OCEAN Normalized to sequential, unmonitored
Contributions • Butterfly analysis: Framework for dynamic parallel monitoring – Key insight: explicitly model regions of uncertainty – Inspired by interval analysis – New primitives: Side-Out and Side-In • Prototype of real lifeguard demonstrates: – Better performance than Timesliced (most cases) – Lifeguards scale well with additional cores – Low false positive rates • Epoch size: Trade off performance and accuracy • Framework can be applied to tools beyond ones shown here Butterfly Analysis 32 Michelle Goodstein
- Slides: 32