Naiad A Timely Dataflow System Derek G Murray
- Slides: 39
Naiad: A Timely Dataflow System Derek G. Murray Michael Isard Frank Mc. Sherry Paul Barham Rebecca Isaacs Martín Abadi Microsoft Research 1
Batch processing Stream processing Timely dataflow Graph processing
< 1 s batch updates #x In ⋈ @y < 1 ms iterations ⋈ z? < 100 ms interactive queries max ⋈
Outline Revisiting dataflow How to achieve low latency Evaluation
Dataflow Stage Connector
Dataflow: parallelism Vertex B C Edge
Dataflow: iteration
Batching (synchronous) Requires coordination ü Supports aggregation vs. Streaming (asynchronous) ü No coordination needed Aggregation is difficult
Batch iteration
Streaming iteration
Timely dataflow – timestamp Supports asynchronous and fine-grained synchronous execution
How to achieve low latency Programming model Distributed progress tracking protocol System performance engineering
Programming model 2× B C. OPERATION(x, y, z) C 2× C. ONCALLBACK(u, v) D
Messages B. SENDBY(edge, message, time) B C D C. ONRECV(edge, message, time) Messages are delivered asynchronously
Notifications C. SENDBY(_, _, time) B C D. NOTIFYAT(time) D No more messages at time or earlier D. ONRECV(_, D. O _, time) NNOTIFY(time) Notifications support batching
Programming frameworks input. Select. Many(x => x. Split()). Where(x => x. Starts. With("#")). Count(x => x); LINQ Graph. LINQ Frameworks All. Reduce Differential dataflow BLOOM BSP (Pregel) Timely dataflow API Distributed runtime
How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol System performance engineering
How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol System performance engineering
Progress tracking Epoch t is complete A B E. NOTIFYAT(t) C D E C. ONRECV(_, _, t) C. SENDBY(_, _, tʹ) tʹ ≥ t
Progress tracking C. NOTIFYAT(t) B C A E D Problem: C depends on its own output
B. SENDBY(_, _, (1, 7)) C. NOTIFYA ATT((1, (t) 6)) A. SENDBY(_, _, 1) E. NOTIFYAT(? ) (1) B C A Advances timestamp Advances loop counter E F D D. SENDBY(1, 6) Solution: structured timestamps in loops
Graph structure leads to an order on events (1, 6) ⊤ (1, 6) B C A 1 E F D (1, 5) (1, 6)
Graph structure leads to an order on events (1, 6) 1. Maintain the set of outstanding events ⊤ B C 2. Sort events by could-result-in (partial) order A 3. Deliver notifications in the frontier of the set F D ONNOTIFY(t) is called after all calls (1, 5) to ONRECV(_, _, t) (1, 6) 1 E
E. NOTIFYAT(1) C. SENDBY(_, _, (1, 5)) D. ONRECV(_, _, (1, 5)) E. ONNOTIFY(1) Optimizations make doing this practical
How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol Enables processes to deliver notifications promptly System performance engineering
How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol Enables processes to deliver notifications promptly System performance engineering
Performance engineering Microstragglers are the primary challenge Garbage collection TCP timeouts Data structure contention O(1– 10 s) O(10– 100 ms) O(1 ms) For detail on how we handled these, see paper (Sec. 3)
How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol Enables processes to deliver notifications promptly System performance engineering Mitigates the effect of microstragglers
Outline Revisiting dataflow How to achieve low latency Evaluation
64 8 -core 2. 1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet System design Data S S S Progress tracker S Control S S S Progress tracker Limitation: Fault tolerance via checkpointing/logging (see paper) S
Iteration latency 95 th percentile: 2. 2 ms 2. 5 Iteration latency (ms) 64 8 -core 2. 1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet 2 1. 5 1 0. 5 0 Median: 750 μs 0 10 20 30 40 Number of computers 50 60 70
Page. Rank Word count LINQ Iterative machine learning Applications Interactive graph analysis Graph. LINQ Frameworks All. Reduce Differential dataflow BLOOM BSP (Pregel) Timely dataflow API Distributed runtime
Twitter graph 42 million nodes 1. 5 billion edges Page. Rank 64 8 -core 2. 1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet Iteration length (s) 100 Pregel (Naiad) 10 Graph. LINQ GAS (Power. Graph) GAS (Naiad) 1 0 10 20 30 40 Number of computers 50 60 70
Interactive graph analysis #x 32 K tweets/s In 10 queries/s @y ⋈ z? ⋈ max ⋈
Query latency Max: 99 th percentile: Median: 1000 Query latency (ms) 32 8 -core 2. 1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet 140 ms 70 ms 5. 2 ms 100 10 1 30 32 34 36 38 40 42 Experiment time (s) 44 46 48 50
Conclusions Low-latency distributed computation enables Naiad to: • achieve the performance of specialized frameworks • provide the flexibility of a generic framework The timely dataflow API enables parallel innovation Now available for download: http: //github. com/Microsoft. Research. SVC/naiad/
For more information Visit the project website and blog http: //research. microsoft. com/naiad/ http: //bigdataatsvc. wordpress. com/ Now available for download: http: //github. com/Microsoft. Research. SVC/naiad/
Naiad Now available for download: http: //github. com/Microsoft. Research. SVC/naiad
- Naiad timely dataflow
- Naiad company
- Introduction of editorial
- Timely progress
- Specific measurable attainable realistic timely
- The process of attracting individuals on a timely basis
- Dataflow verilog
- Dr suman jana
- Arbicor
- Verilog procedural assignment
- Caresource timely filing limit
- Cay ryan murray nude
- Murray wilkinson
- Arnold murray alan turing
- Million dollar murray
- Kepribadian menurut murray
- Dissolution of partnership accounting
- Kahulugan ng badayos
- Murray johannsen
- Broken arrow lodge & marina
- Teoria de henry murray
- David gordon rowling murray
- Transdevv
- Fr paul murray
- Richard murray caltech
- Joan clarke turing
- Rowena murray
- Anna murray-douglass
- Metoda cubului proiect didactic
- Necesidades biogénicas
- Henry murray theory
- Richard murray
- Henry maslow
- Elizabeth murray harvard
- James murray spangler inventions
- Henry murray theory
- Signe de genety
- Doug murray actor
- Susannah beasley-murray
- Murray maybery