Concurrent Revisions A deterministic concurrency model Daan Leijen

  • Slides: 41
Download presentation
Concurrent Revisions: A deterministic concurrency model. Daan Leijen, Alexandro Baldassin, and Sebastian Burckhardt Microsoft

Concurrent Revisions: A deterministic concurrency model. Daan Leijen, Alexandro Baldassin, and Sebastian Burckhardt Microsoft Research (OOPSLA 2010)

The concurrency elephant • Task/Data Parallel: TPL, X 10, Cilk, Stream. It, Cuda, Open.

The concurrency elephant • Task/Data Parallel: TPL, X 10, Cilk, Stream. It, Cuda, Open. MP, etc. • Concurrent: Thread, Locks, Promises, Transactions, etc. • Our focus: Concurrent interactive applications with large shared data structures.

Application = Shared Data and Tasks Example: Office application • • Save the document

Application = Shared Data and Tasks Example: Office application • • Save the document React to keyboard input by the user Perform a spellcheck in the background Exchange updates with remote users Reader Mutator Reader Shared Data Mutator Reader

Spacewars! About 15 k lines of C# code, using Direct. X. The original game

Spacewars! About 15 k lines of C# code, using Direct. X. The original game is sequential

Examples from Space. Wars Game Example 1: read-write conflict § Render task reads position

Examples from Space. Wars Game Example 1: read-write conflict § Render task reads position of all game objects § Physics task updates position of all game objects => Render task needs to see consistent snapshot Example 2: write-write conflict § Physics task updates position of all game objects § Network task updates position of some objects => Network has priority over physics updates

Conventional Concurrency Control Conflicting tasks can not efficiently execute in parallel. § pessimistic concurrency

Conventional Concurrency Control Conflicting tasks can not efficiently execute in parallel. § pessimistic concurrency control (i. e. locks) • use locks to avoid parallelism where there are (real or potential) conflicts § optimistic concurrency control (i. e. TM) • speculate on absence of conflicts rollback if there are real conflicts either way: true conflicts kill parallelism.

Our Proposed Programming Model: Revisions and Isolation Types Revision A logical unit of work

Our Proposed Programming Model: Revisions and Isolation Types Revision A logical unit of work that is forked and joined • • • Isolation Type A type which implements automatic copying/merging of versions on write-write conflict Deterministic Conflict Resolution, never roll-back No restrictions on tasks (can be long-running, do I/O) Full concurrent reading and writing of shared data Clean semantics (see technical report) Fast and space-efficient runtime implementation

No isolation: We see either 0 or 1 depending on the schedule What’s new

No isolation: We see either 0 or 1 depending on the schedule What’s new Isolation types: declares shared data fork revision: forks off a private copy of the shared state Traditional Task Concurrent Revisions int x = 0; Task t = fork { x = 1; } assert(x==0 || x==1); join t; assert(x==1); Versioned<int> x = 0; Revision r = rfork { x = 1; } assert(x==0); join r; assert(x==1); • Isolation: side effects are only visible when the revision is joined. • Deterministic execution! isolation: Concurrent modifications are not seen by others join revision: waits for the revision to terminate and writes back changes into the main revision

Puzzle time… int x = 0; int y = 0; Task t = fork

Puzzle time… int x = 0; int y = 0; Task t = fork { if (x==0) y++; } if (y==0) x++; join t; Hard to read: let’s use a diagram instead…

Sequential consistency int x = 0 int y = 0 if (y==0) x++; if

Sequential consistency int x = 0 int y = 0 if (y==0) x++; if (x==0) y++; ? ? assert( (x==0 && y==1) || (x==1 && y==0) || possible (x==1 &&values y==1)); What are the for x and y?

Transactional memory int x = 0 int y = 0 atomic { if (y==0)

Transactional memory int x = 0 int y = 0 atomic { if (y==0) x++; } ? ? atomic { if (x==0) y++; } assert( (x==0 && y==1) || (x==1 && y==0));

Concurrent revisions Versioned<int> x = 0 Versioned<int> y = 0 Isolation y is always

Concurrent revisions Versioned<int> x = 0 Versioned<int> y = 0 Isolation y is always 0 if (y==0) x++; ? ? assert(x==1 && y==1); Isolation and x is always 0 if (x==0) y++; Determinism only 1 possible result

Conflict resolution Versioned<int> x; x = 0 x = 1 By default, on a

Conflict resolution Versioned<int> x; x = 0 x = 1 By default, on a write-write conflict (only), the modification in the child revision wins. x = 0 x = 2 assert(x==2) x = 1 x = 0 assert(x==0) x = 1 assert(x==1)

Custom conflict resolution Cumulative<int, � (main, join, orig). main + join – orig> x;

Custom conflict resolution Cumulative<int, � (main, join, orig). main + join – orig> x; x = 0 0 x += 1 merge(1, 2, 0) � 3 x += 2 1 2 assert(x==3)

Demo class Sample { [Versioned] int i = 0; public void Run() { var

Demo class Sample { [Versioned] int i = 0; public void Run() { var r = Current. Revision. Fork(() => { i += 1; }); i += 2; Current. Revision. Join(r); Console. Write. Line("i = " + i); } }

Demo: Sandbox class Sandbox { [Versioned] int i = 0; Fork a revision without

Demo: Sandbox class Sandbox { [Versioned] int i = 0; Fork a revision without forking an associated task/thread public void Run() { var r = Current. Revision. Branch("Flaky. Code"); try { r. Run(() => Run code in a certain revision { i = 1; throw new Exception("Oops"); }); Current. Revision. Merge(r); } Merge changes in a catch { revision into the main one Current. Revision. Abandon(r); } Console. Write. Line("n i = " + i); } } Abandon a revision and don’t merge its changes.

A Software engineering perspective • Transactional memory: § Code centric: put “atomic” in the

A Software engineering perspective • Transactional memory: § Code centric: put “atomic” in the code § Granularity: • too broad: too many conflicts and no parallel speedup • too small: potential races and incorrect code • Concurrent revisions: § Data centric: put annotations on the data § Granularity: group data that have mutual constraints together, i. e. if (x + y > 0) should hold, then x and y should be versioned together.

Current Implementation: C# library • For each versioned object, maintain multiple copies § Map

Current Implementation: C# library • For each versioned object, maintain multiple copies § Map revision ids to versions § `mostly’ lock-free array Revision Value 1 0 40 2 45 7 • New copies are allocated lazily § Don’t copy on fork… copy on first write after fork • Old copies are released on join § No space leak

Full algorithm in the paper…

Full algorithm in the paper…

Space. Wars Game Parallel Collision Detection Parallel Collision. Detection Simulate Physics Play Sounds Shared

Space. Wars Game Parallel Collision Detection Parallel Collision. Detection Simulate Physics Play Sounds Shared State Send Receive Network Connection Render Screen Autosave Disk Graphics Card Sequential Game Loop: Process Inputs Keyboard

autosave (long running) network Physics Render Coll. Det. 4 Coll. Det. 3 Coll. Det.

autosave (long running) network Physics Render Coll. Det. 4 Coll. Det. 3 Coll. Det. 2 Coll. Det. 1 Revision Diagram for Parallelized Game Loop

autosave (long running) network Physics Render Coll. Det. 4 Coll. Det. 3 Coll. Det.

autosave (long running) network Physics Render Coll. Det. 4 Coll. Det. 3 Coll. Det. 2 § Render task reads position of all game objects § Physics task updates position of all game objects § No interference! Coll. Det. 1 “Problem Example 1” is solved

autosave (long running) network Physics Render Coll. Det. 4 Coll. Det. 3 Coll. Det.

autosave (long running) network Physics Render Coll. Det. 4 Coll. Det. 3 Coll. Det. 2 § Physics task updates position of all game objects § Network task updates position of some objects § Network updates have priority over physics updates § Order of joins establishes precedence! Coll. Det. 1 “Problem Example 2” is solved.

Results Physics task Render Collision detection § Autosave now perfectly unnoticeable in background §

Results Physics task Render Collision detection § Autosave now perfectly unnoticeable in background § Overall Speed-Up: 3. 03 x on four-core (almost completely limited by graphics card)

Only a 5% slowdown in the sequential case Overhead: Some individual tasks How much

Only a 5% slowdown in the sequential case Overhead: Some individual tasks How much does all the copying and the indirection cost? slow down much more (i. e. physics simulation)

Conclusion Revisions and Isolation Types simplify the parallelization of applications with tasks that §

Conclusion Revisions and Isolation Types simplify the parallelization of applications with tasks that § § Exhibit conflicting accesses to shared data Have unpredictable latency Have unpredictable data access pattern May perform I/O that can not be rolled back Revisions and Isolation Types are § easy to reason about (determinism, isolation) § have low-enough overhead for many applications

Questions? • daan@microsoft. com • sburckha@microsoft. com • External download available soon

Questions? • daan@microsoft. com • sburckha@microsoft. com • External download available soon

Sequential int x = 0; Consistency int y = 0; task t = fork

Sequential int x = 0; Consistency int y = 0; task t = fork { if (x==0) y++; } if (y==0) x++; join t; Transactional Memory assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); Concurrent Revisions int x = 0; int y = 0; task t = fork { atomic { if (x==0) y++; } } atomic { if (y==0) x++; } join t; versioned<int> x = 0; versioned<int> y = 0; revision r = rfork { if (x==0) y++; } if (y==0) x++; join r; assert( (x==0 && y==1) || (x==1 && y==0)); assert(x==1 && y==1);

x = 0 0 x += 1 x += 2 2 merge(1, 2, 0)�

x = 0 0 x += 1 x += 2 2 merge(1, 2, 0)� 3 1 3 x += 3 2 5 merge(3, 5, 2) � 6 assert( x==6 )

By construction, there is no ‘global’ state: just local state for each revision State

By construction, there is no ‘global’ state: just local state for each revision State is simply a (partial) function from a location to a value

Operational Semantics For some revision r, with snapshot � and local modifications � and

Operational Semantics For some revision r, with snapshot � and local modifications � and an expression context with hole (� x. e) v On a join, the writes of the joinee r’ take priority over the writes of the current revision: �: : � ’ the state is a composition of the root snapshot � and local modifications � On a fork, the snapshot of the new revision r’ is the current state: � : : �

Custom merge: per location (type) No conflict if a location was not written in

Custom merge: per location (type) No conflict if a location was not written in the joinee On a join, using a merge function. Conflict otherwise, use a location/type specific merge function Standard merges: No conflict if a location was unmodified in the current revision, use the value of the joinee

What is a conflict? Cumulative<int> x = 0 • Merge is only called if:

What is a conflict? Cumulative<int> x = 0 • Merge is only called if: (1) write in child, and (2) modification in main revision: 0 x += 2 2 No conflict (merge function is not called) 0 2 x += 3 2 5 assert( x = 5 )

Merging with failure On fail, we just ignore any writes in the joinee

Merging with failure On fail, we just ignore any writes in the joinee

Snapshot isolation • Widely used in databases, for example Oracle and Microsoft SQL •

Snapshot isolation • Widely used in databases, for example Oracle and Microsoft SQL • In essence, in snapshot isolation a concurrent transaction can only complete in the absence of write-write conflicts. • Our calculus generalizes snapshot isolation: § We support arbitrary nesting § We allow custom merge functions to resolve write -write conflicts deterministically

Snapshot isolation We can succinctly model snapshot isolation as: • Disallow nesting • Use

Snapshot isolation We can succinctly model snapshot isolation as: • Disallow nesting • Use the default merge: Some versions of snapshot isolation do not treat silent writes in a transaction as a conflict:

Sequential merges • We can view each location as an abstract data types (i.

Sequential merges • We can view each location as an abstract data types (i. e. object) with certain operations (i. e. methods). • If a merge function always behaves as if concurrent operations for those objects are sequential, we call it a sequential merge. • Such objects always behave as if the operations in the joinee are all done sequentially at the join point.

Sequential merges x=o • A merge is sequential if: u merge(uw 1(o), uw 2(o),

Sequential merges x=o • A merge is sequential if: u merge(uw 1(o), uw 2(o), u(o)) = uw 1 w 2(o) w 1 w 2 • And uw 1 w 2(o) � � merge(uw 1(o), uw 2(o), u(o))

Abelian merges • For any abstract data type that forms an abelian group (associative,

Abelian merges • For any abstract data type that forms an abelian group (associative, commutative, with inverses) with neutral element 0 and an operation � , the following merge is sequential: merge(v, v’, v 0) = v �v’ �v 0 • This holds for example for additive integers and additive sets.