Determinate Imperative Programming The CF Model Vijay Saraswat

  • Slides: 22
Download presentation
Determinate Imperative Programming: The CF Model Vijay Saraswat IBM TJ Watson Research Center joint

Determinate Imperative Programming: The CF Model Vijay Saraswat IBM TJ Watson Research Center joint work with Radha Jagadeesan, Armando Solar. Lezama, Christoph von Praun http: //www. saraswat. org/cf. html

Outline n Problem: q q n Many concurrent imperative programs are determinate. Determinacy is

Outline n Problem: q q n Many concurrent imperative programs are determinate. Determinacy is not apparent from the syntax. Basic idea q A variable is the stream of values written to it by a thread. n Many examples n Semantics n Implementation n Future work 2

Background: X 10 n Five basic themes: q q q Partitioned address space Pervasive

Background: X 10 n Five basic themes: q q q Partitioned address space Pervasive explicit asynchrony (Cilk-style recursive parallelism) Java base Guaranteed VM invariants Explicit, distributed VM n Few language extensions q q <s> = async <s> = finish <s> = foreach ( <v>, …, <v> in <e>) <s> Multidimensional arrays over distributions Subsumes MPI, Open. MP, SPMD languages, Cilk … 3

X 10: clocks, clocked final data structures n n Clocks can be created dynamically.

X 10: clocks, clocked final data structures n n Clocks can be created dynamically. Activities are registered with clocks. An activity may register a newly created activity with one of its clocks. “next; ” resumes each clock; blocks until each clock advances. q This is sufficient for deadlock-freedom. q Adequate for parallel operations on arrays q But not dataflow n n Clock advances when all activities registered on it resume the clock. Operations q q n c. resume(); next; c. drop(); Clocked final datum q q In each phase of the clock the datum is immutable. Read gets current value; write updates in next phase. Clocks do not introduce deadlock; clocked finals are determinate. 4

Clocked final example: Array relaxation G elements are assigned to at most once in

Clocked final example: Array relaxation G elements are assigned to at most once in each phase of clock c. int clocked (c) final [0: M-1, 0: N-1] G = …; Each activity is registered on c. finish foreach (int i, j in [1: M-1, 1: N-1]) clocked (c) { for (int p in [0: Time. Step-1]) { Read current value of cell. G[i, j] = omega/4*(G[i-1, j]+G[i+1, j]+G[i, j-1]+G[i, j+1])+(1 -omega)*G[i, j]; next; Wait for clock to advance. } } Write visible (only) when clock advances. Takeaway: Each cell is assigned a clocked stream of immutable values. 5

Imperative Programming Revisited n Variables q q n n Value in a Box Read:

Imperative Programming Revisited n Variables q q n n Value in a Box Read: fetch current value Write: change value Stability condition: Value does not change unless a write is performed int x = 0; async x=1; print(x); Very powerful q Permit repeated manywriter, many-reader communication through arbitrary reference graphs Asynchrony introduces indeterminacy n May write out either 0 or 1. Reader-reader, reader-writer, writer-writer conflicts. 6

Determinate Concurrent Imperative frameworks n Asynchronous Kahn networks q q Nodes can be thought

Determinate Concurrent Imperative frameworks n Asynchronous Kahn networks q q Nodes can be thought of as (continuous) functions over streams. Pop/peek Push Node-local state may mutate arbitrarily n Concurrent Constraint Programming q q Tell constraints Ask if a constraint is true Subsumes Kahn networks (dataflow). Subsumes (det) concurrent logic programming, lazy functional programming Do not support arbitrary mutable variables. 7

Determinate Concurrent Imperative Frameworks Good: n Safe Asynchrony (Steele 1991) q q q Parent

Determinate Concurrent Imperative Frameworks Good: n Safe Asynchrony (Steele 1991) q q q Parent may communicate with children. Children may communicate with parent. Siblings may communicate with each other only through commutative, associative writes (“commuting writes”). int x=0; finish foreach (int i in 1: N) { x += i; } print(x); // N*(N+1)/2 Bad: int x=0; finish foreach (int i in 1: N) { x += i; async print(x); } Useful but limited. Does not permit dataflow synch. 8

The CF Basic model n n A shared variable is a stream of immutable

The CF Basic model n n A shared variable is a stream of immutable values. Each activity maintains an index i + clean/dirty bit for every shared variable. q q q n World Map=Collection of indices for an activity. n Index transmission rules. q Initially i=1, v[0] contains initial value. Read: If clean, block until v[i] is written and return v[i++] else return v[i-1]. Mark as clean. Write: Write into v[i++]. Mark as dirty. A read stutters (returns value in last phase) if no activity can write in this phase. q n E. g. for local variables. q n Activity initialized with current world map of parent activity. On finish, world map of activity is lubbed with world map of finished activities. (clean lub dirty = clean) All programs are determinate and scheduler independent. q May deadlock … nexts are not conjunctive. The clock of clocked final is made implicit. 9

CF example: Array relaxation shared int [0: M-1, 0: N-1] G = …; finish

CF example: Array relaxation shared int [0: M-1, 0: N-1] G = …; finish foreach (int i, j in [1: M-1, 1: N-1]) { for (int p in [0: Time. Step-1]) { G[i, j] = omega/4*(G[i-1, j]+G[i+1, j]+G[i, j-1]+G[i, j+1])+(1 -omega)*G[i, j]; } } All clock manipulations are implicit. 10

Some simple examples shared int x=0; 0 finish { async {int r 1 =

Some simple examples shared int x=0; 0 finish { async {int r 1 = x; int r 2 = x; println(r 1); println(r 2); } 1 async {x=1; x=2; } } i x A 1 0 0 read r 1 1 1 read r 2 2 2 A 2 write 1 write 2 Only one result – independent of the scheduler! 11

Some simple examples shared int x=0; finish { async {int r 1 = x;

Some simple examples shared int x=0; finish { async {int r 1 = x; int r 2 = x; println(r 1); println(r 2); } 0 async {x=1; } 1 async {x=1; int r 3 = x; async {x=2; }} 2 } println(x); i x A 1 (0) 0 0 read r 1 1 1 read r 2 2 2 A 2 (0) A 3 (0) write 1; read r 3 A 4 (2) write 2 All programs are determinate. 12

Some Stream. It examples Stream. It X 10/CF void -> void pipeline Minimal {

Some Stream. It examples Stream. It X 10/CF void -> void pipeline Minimal { shared int x=0; 0 add Int. Source; async while (true) x++; 1 add Int. Printer; async while (true) println(x); … } void ->int filter Int. Source { int x; init {x=0; } work push 1 { push(x++); } The communication is through assignment to x, so the same result is obtained with: } shared int x=0; 0 int->void filter Int. Printer { async while (true) ++x; 1 async while (true) println(x); … work pop 1 { print(pop()); } } Each shared variable is a multi-reader, multi-writer stream. 13

Some Stream. It examples: fibonacci shared int x=1, y=1; async while (true) y=x; async

Some Stream. It examples: fibonacci shared int x=1, y=1; async while (true) y=x; async while (true) x+=y; i 0 1 2 3 … y 1 1 2 3 … x 1 2 3 5 … Activity 1 Activity 2 Can express any recursive, asynchronous Kahn network. 14

Stream. It examples: Moving Average void->void pipeline Moving. Average { add int. Source(); add

Stream. It examples: Moving Average void->void pipeline Moving. Average { add int. Source(); add Averager(10); add Int. Printer(); shared int y=0; shared int x=0; async while (true) x++; async while (true) { int sum=x; } for (int i in 1: N-1) sum += peek(x, i); int->int filter Average(int n) { work pop 1 push 1 peek n { int sum=0; y = sum/N; } for (int i=0; i < n; i++) sum += peek(i); push(sum/n); pop(); • peek(x, i) reads the i’th future value, without popping it. Blocks if necessary. } } 15

Stream. It examples: Bandpass filter float->float pipeline Band. Pass. Filter(float rate, float band. Pass.

Stream. It examples: Bandpass filter float->float pipeline Band. Pass. Filter(float rate, float band. Pass. Filter(float rate, float low, float high, int taps) { float high, int taps, int in) { add BPFCore(rate, low, high, taps); int tmp=in; add Subtracter(); } shared int in 1=tmp, in 2=tmp; float ->float splitjoin BPFCore async while (true) in 1=in; (float rate, float low, async while (true) in 2=in; float high, int taps) { shared int o 1 = low. Pass(rate, low, taps, 0, in 1), split duplicate; o 2 = low. Pass(rate, high, taps, 0, in 2); add Low. Pass(rate, low, taps, 0); shared int o = o 1 -o 2; add Low. Pass(rate, high, taps, 0); async while(true) o = o 1 -o 2; join roundrobin; } float->float filter Subtracter { return o; } Work pop 2 push 1 { push(peek(1)-peek(0)); pop(); }} Functions return streams. 16

Canon matrix multiplication Parameters whose values are finalized. <final int N>void canon (double[N, N]

Canon matrix multiplication Parameters whose values are finalized. <final int N>void canon (double[N, N] c, double[N, N] a, double[N, N] b) { finish foreach (int i, j in [0: N-1, 0: N-1]) { a[i, j] = a[i, (j+1) % N]; b[i, j] = b[(i+j)%N, j]; } for (int k in [0: N-1]) Local variables in each activity. finish foreach (int i, j in [0: N-1, 0: N-1]) { c[i, j] = c[i+j] + a[i, j]*b[i, j]; a[i, j] = a[i, (j+1)%N]; b[i, j] = b[(i+1)%N, j]; } } The natural sequential program works (for finish foreach). 17

Histogram n n Permit “commuting” writes to be performed simultaneously in the same phase.

Histogram n n Permit “commuting” writes to be performed simultaneously in the same phase. Phase is completed when all activities that can write have written. <int N> [1: N][] histogram([1: N][] A) { final int[] B = new int [1: N]; finish foreach(int i in A) B[A[i]]++; return B; } B’s phase is not yet complete. A subsequent read will complete it. 18

Cilk programs with races int x; cilk void foo() { x = x +1;

Cilk programs with races int x; cilk void foo() { x = x +1; Determinate: Will always print 1 in CF. } cilk int main() { x=0; spawn foo(); sync; printf(“x is %dn”, x); return 0; } CF smoothly combines Cilk and Stream. It. 19

Implementation n n Each activity’s world map increases monotonically with time. Use garbage collection

Implementation n n Each activity’s world map increases monotonically with time. Use garbage collection to erase past unreachable values. n Programs with no sibling communication may be executed in buffers with unit windows. n Considering permitting user to specify bounds on variables (cf push/pop specifications in Stream. It). q This will force writes to become blocking as well. Scheduling strategy affects size of buffers, not result. 20

Formalization n MJ/CF Very straightforward additions to field read/write. Paper contains details. Surprisingly localized.

Formalization n MJ/CF Very straightforward additions to field read/write. Paper contains details. Surprisingly localized. 21

Future work n Paper contains ideas on detecting deadlock (stabilities) at runtime and recovering

Future work n Paper contains ideas on detecting deadlock (stabilities) at runtime and recovering from them. q n Implementation. q n Programmability being investigated. Leverage connection with Stream. It, and static scheduling. Coarser granularity for indices. q q Use same clock for many variables. Permits “coordinated” changes to multiple variables. 22