CS 5204 Eraser A Dynamic Data Race Detector

  • Slides: 33
Download presentation
CS 5204 Eraser: A Dynamic Data Race Detector for Multithreaded Programs Savage et al

CS 5204 Eraser: A Dynamic Data Race Detector for Multithreaded Programs Savage et al (SOSP 1997, ACM TOCS) Presenter: Godmar Back

16 Years Back • Parallel Computing Wave – Several massively parallel designs • •

16 Years Back • Parallel Computing Wave – Several massively parallel designs • • Emergence of low-cost SMP OS Support for multi-threaded programs Multi-threaded OS kernels Emergence of the Internet CS 5204 Fall 2013 2

Why Threads Are A Bad Idea • 1996 Usenix talk by John Ousterhout: “Why

Why Threads Are A Bad Idea • 1996 Usenix talk by John Ousterhout: “Why threads are a bad idea” – Threads are hard to program (synchronization errorprone, prone to deadlock, convoying, etc. ); experts only – Threads can make modularization difficult – Thread implementations are hard to make fast – Threads aren’t well-supported (as of 1996) • (Ousterhout’s) Conclusion: use threads only when their power is needed - for true CPU concurrency on an SMP – else use singlethreaded event-based model CS 5204 Fall 2013 3

How to write save programs • Build safety into the language! • Hoare’s Monitors

How to write save programs • Build safety into the language! • Hoare’s Monitors – (see separate slide set) CS 5204 Fall 2013 4

Concurrency Errors vs Data Races • • Unintended sharing Atomicity violations Order violations Deadlocks/Livelocks

Concurrency Errors vs Data Races • • Unintended sharing Atomicity violations Order violations Deadlocks/Livelocks Likely result in data race – Source: Freund et al CS 5204 Fall 2013 5

Data Race • Conflicting access to the same memory location, at least one of

Data Race • Conflicting access to the same memory location, at least one of them being a write – Read/write – Write/read – Write/write • A race is a pair of conflicting accesses that happens concurrently CS 5204 Fall 2013 6

“Happens concurrently”? • A & B are said to happen concurrently if, in a

“Happens concurrently”? • A & B are said to happen concurrently if, in a sequentially consistent execution, they could have occur in either order. That is, there exist sequentially consistent executions in which they could have occurred one after the other. • Sequential Consistency – Updates are seen in program order by each thread – All updates to shared memory are seen in same order by all threads CS 5204 Fall 2013 7

Race Detection Approaches • Happens-Before Relationship – Lamport 1978 – Distributed Systems Idea •

Race Detection Approaches • Happens-Before Relationship – Lamport 1978 – Distributed Systems Idea • First applied by Dinning and Schonberg 1991 CS 5204 Fall 2013 8

HB in Distributed Systems a P 1 b [1 0 0] [2 0 0]

HB in Distributed Systems a P 1 b [1 0 0] [2 0 0] g P 2 [0 1 0] c d [3 0 0] [2 0 0] [3 0 0] P 3 [4 3 0] [5 5 3] [2 3 0] k i j [2 2 0] [2 3 0] [0 1 0] f l [2 5 3] m [2 4 3] [2 5 3] [0 1 3] [3 6 5] [3 1 5] n o p q r s [0 0 1] [0 1 2] [0 1 3] [3 1 4] [3 1 5] [3 1 6] a f [1 0 0] < [5 5 3] b s [2 0 0] < [3 1 6] CS 5204 Fall 2013 c m [3 0 0] < [3 6 5] 9

Vector Clocks a P 1 b [1 0 0] [2 0 0] g P

Vector Clocks a P 1 b [1 0 0] [2 0 0] g P 2 [0 1 0] c d [3 0 0] [2 0 0] [3 0 0] P 3 [4 3 0] [5 5 3] [2 3 0] k i j [2 2 0] [2 3 0] [0 1 0] f l [2 5 3] m [2 4 3] [2 5 3] [0 1 3] [3 6 5] [3 1 5] n o p q r s [0 0 1] [0 1 2] [0 1 3] [3 1 4] [3 1 5] [3 1 6] d || s [4 3 0] < [3 1 6] q || i [3 1 4] < [2 2 0] CS 5204 Fall 2013 k || r [2 4 3] < [3 1 5] 10

Vector Clocks • Vector timestamps: – Each node keeps track of logical time of

Vector Clocks • Vector timestamps: – Each node keeps track of logical time of other nodes (as far as it’s seen messages from them) in Vi[i] – Send vector timestamp vt along with each message – Reconcile vectors timestamp with own vectors upon receipt using MAX(vt[k], Vi [k]) for all k CS 5204 Fall 2013 11

HB for Race Detection • Happens-Before relationships between synchronization and thread creation/join events •

HB for Race Detection • Happens-Before relationships between synchronization and thread creation/join events • Code leading to thread create happens before thread body • Thread body happens before code subsequent to join() • Unlock() happens before lock() CS 5204 Fall 2013 12

Happens Before Race Detection • Monitor thread’s accesses to shared variables • Find concurrent

Happens Before Race Detection • Monitor thread’s accesses to shared variables • Find concurrent events (for instance, using vector clocks) – Output warning • Sound: never reports a false race • Does not require races to manifest in execution, but may miss some races depending on execution CS 5204 Fall 2013 13

Missed Race in HB HB relationship is spurious – ‘mu’ doesn’t protect y Thread

Missed Race in HB HB relationship is spurious – ‘mu’ doesn’t protect y Thread 1 y : = y+1; Lock(mu); v: = v+1; Unlock(mu); Thread 2 Lock(mu); v: = v+1; Unlock(mu); y : = y+1; CS 5204 Fall 2013 14

Cost of HB (1997) • Vector clocks – Need one per memory location –

Cost of HB (1997) • Vector clocks – Need one per memory location – Of size O(t) where t is number of threads ever created in system; all operations on them are of size t. CS 5204 Fall 2013 15

Eraser Intuition • HB too expensive • Observation: properly written programs follow lock discipline

Eraser Intuition • HB too expensive • Observation: properly written programs follow lock discipline • Lock l protects variable v. • If v is ever accessed while l isn’t held -> race • How to find out which l protects which v? – Infer it from the program CS 5204 Fall 2013 16

Lock Set Algorithm (Simple Version) Let locks_held(t) be the set of locks held by

Lock Set Algorithm (Simple Version) Let locks_held(t) be the set of locks held by thread t For each shared memory location v, initialize C(v) to the set of all locks On each access to v by thread t, Set C(v) : = C(v) ∩ locks_held(t) If C(v) : = {}, then issue a warning CS 5204 Fall 2013 17

Example Program locks_held C(v) int v; {} {mu 1, mu 2} lock(mu 1); {mu

Example Program locks_held C(v) int v; {} {mu 1, mu 2} lock(mu 1); {mu 1} v : = v + 1; {mu 1} unlock(mu 1); {} lock(mu 2); {mu 2} v : = v + 1; unlock(mu 2); {} Warning! {} CS 5204 Fall 2013 18

Refinement • Simple version flags false positives for 1. Variable initialization without locks held

Refinement • Simple version flags false positives for 1. Variable initialization without locks held 2. Read sharing 3. Read-write locking mechanisms • Handle 1. , 2. by introducing states • Handle 3. by changing algorithm CS 5204 Fall 2013 19

Refinement State Machine Virgin wr wr new thread Shared Modified Exclusive rd/wr first thread

Refinement State Machine Virgin wr wr new thread Shared Modified Exclusive rd/wr first thread rd new thread wr Shared rd CS 5204 Fall 2013 20

Lock Set Algorithm (Extended) • Let locks_held(t) be the set of locks held in

Lock Set Algorithm (Extended) • Let locks_held(t) be the set of locks held in any mode by thread t • Let write_locks_held(t) be the set of locks held in write mode by thread t • For each shared memory location v, initialize C(v) to the set of all locks • On each read of v by thread t, • Set C(v) : = C(v) ∩ locks_held(t) • If C(v) = {}, then issue a warning • On each write of v by thread t, • Set C(v) : = C(v) ∩ write_locks_held(t) • If C(v) = {}, then issue a warning CS 5204 Fall 2013 21

Implementation • Hashtable of locksets, indexed by integer • Shadow memory – Variable ->

Implementation • Hashtable of locksets, indexed by integer • Shadow memory – Variable -> (state, lockset index) – Fast access using simple offset CS 5204 Fall 2013 22

Shadow Memory CS 5204 Fall 2013 23

Shadow Memory CS 5204 Fall 2013 23

Missing Races int[] shared = new int[1]; Thread t = new Thread() { public

Missing Races int[] shared = new int[1]; Thread t = new Thread() { public void synchronized run() { shared[0] = shared[0] + 1; . . . } }; . . . shared[0] = 512; t. start(); shared[0] = shared[0] + 256; • Q: When is this race missed? CS 5204 Fall 2013 24

Unhandled Cases • Memory Reuse • “Benign” races – Not really benign, see You

Unhandled Cases • Memory Reuse • “Benign” races – Not really benign, see You Don't Know Jack about Shared Variables or Memory Models [Boehm/Adve 2011] CS 5204 Fall 2013 If (fptr == NULL) { lock(fptr_mu); if (fptr == NULL) { fptr = open(filename); } unlock(fptr_mu); } 25

Race Detection Since Eraser • Combined HB + Lock. Set – DJIT+ – Google

Race Detection Since Eraser • Combined HB + Lock. Set – DJIT+ – Google Thread Sanitizer – Helgrind+ – Race. Track [Yu 2005] – Goldilocks – Helgrind <= 3. 3 • Pure HB – Helgrind 3. 4 and later – Intel Thread Checker – Fasttrack CS 5204 Fall 2013 26

Dynamic Data-Race Detection Happens Precision Fast. Track Vector Clocks [M 88] Before Goldilocks [EQT

Dynamic Data-Race Detection Happens Precision Fast. Track Vector Clocks [M 88] Before Goldilocks [EQT 07] [Flanagan-Freund 09] [Lamport 78] DJIT+ [ISZ 99, PS 03] TRa. De [CB 01]. . . Race. Track [YRC 05] Multi. Race [PS 03] • Design Criteria: Hybrid Race Detector [OC 03] . . . - sound & complete (find at least 1 st data race on each var) Barriers [PS 03] - efficient Initialization [v. PG 01] • Insight: . . . • HB relation is a partial order • But all accesses to a var are Eraser almost always totally ordered [SBN+ 97] Cost Source: Freund et al

Source: Flanagan 2009 CS 5204 Fall 2013 28

Source: Flanagan 2009 CS 5204 Fall 2013 28

Source: Flanagan 2009 CS 5204 Fall 2013 29

Source: Flanagan 2009 CS 5204 Fall 2013 29

In this paper, we focus on online dynamic race detectors, which generally fall into

In this paper, we focus on online dynamic race detectors, which generally fall into two categories depending on whether they report false Unfortunately, tools based on happensalarms. Precise race before have two significant drawbacks. detectors never produce First, they are difficult to implement false alarms. efficiently because they require (…) Precise dynamic race per-thread information about concurrent detectors do not reason accesses to each shared-memory about all possible traces, location. More importantly, the however, and may not effectiveness of tools based on happens identify races that occur only before is highly dependent on the when other code paths are interleaving produced by the scheduler. taken. (…) A variety of alternative imprecise race While Eraser is a testing tool and detectors have been therefore cannot guarantee that a developed, which may program is free from races, it can detect provide improved more races than tools based on happens performance (and -before. sometimes better coverage), but which report false alarms on some race-free programs. Savage et al 1997 Flanagan & Freund 2010 For example, Eraser's Lock. Set algorithm…. Views on precision CS 5204 Fall 2013 30

Aside on Helgrind CS 5204 Fall 2013 31

Aside on Helgrind CS 5204 Fall 2013 31

Project Idea • Implement Fasttrack-like algorithm for node. js based on Boris Petrov, Martin

Project Idea • Implement Fasttrack-like algorithm for node. js based on Boris Petrov, Martin Vechev, Manu Sridharan, and Julian Dolby. 2012. Race detection for web applications. In Proceedings of the 33 rd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 251 -262. DOI=10. 1145/2254064. 2254095 http: doi. acm. org/10. 1145/2254064. 2254095 http: //researcher. watson. ibm. com/researcher/files/us-msridhar/pldi 12 -wr. pdf CS 5204 Fall 2013 32

Conclusion • Eraser was influential paper that pioneered Lockset idea • Influenced a decade

Conclusion • Eraser was influential paper that pioneered Lockset idea • Influenced a decade of development and refinement of race detectors CS 5204 Fall 2013 33