Aalo Efficient Coflow Scheduling Without Prior Knowledge Mosharaf

  • Slides: 30
Download presentation
Aalo Efficient Coflow Scheduling Without Prior Knowledge Mosharaf Chowdhury, Ion Stoica UC Berkeley

Aalo Efficient Coflow Scheduling Without Prior Knowledge Mosharaf Chowdhury, Ion Stoica UC Berkeley

Communication is Crucial Performance Facebook jobs spend ~25% of runtime on average in intermediate

Communication is Crucial Performance Facebook jobs spend ~25% of runtime on average in intermediate comm. 1 Map Stage Reduce Stage As SSD-based and in-memory systems proliferate, the network is likely to become the primary bottleneck 1. Based on a month-long trace with 320, 000 jobs and 150 Million tasks, collected from a 3000 -machine Facebook production Map. Reduce cluster.

Flow-Based Solutions CSFQ WFQ GPS 1980 s RED ECN 1990 s D 3 XCP

Flow-Based Solutions CSFQ WFQ GPS 1980 s RED ECN 1990 s D 3 XCP 2000 s Per-Flow Fairness RCP 2005 DCTCP 2010 De. Tail PDQ D 2 TCP p. Fabric FCP 2015 Flow Completion Time Independent flows cannot capture the collective communication patterns (e. g. , shuffle) common in data-parallel applications

Cof low 1. Minimize completion times, 2. Meet deadlines, or 3. Perform fair allocation

Cof low 1. Minimize completion times, 2. Meet deadlines, or 3. Perform fair allocation Communication abstraction for data-parallel applications to express their performance goals

Benefits of. Inter-Coflow Scheduling Link 2 Link 1

Benefits of. Inter-Coflow Scheduling Link 2 Link 1

Benefits of. Inter-Coflow Scheduling Coflow 2 Coflow 1 6 Units 2 Units Link 2

Benefits of. Inter-Coflow Scheduling Coflow 2 Coflow 1 6 Units 2 Units Link 2 3 Units Link 1 Smallest-Flow First Fair Sharing Smallest-Coflow First L 2 L 2 L 1 L 1 2 time 4 6 Coflow 1 comp. time = 5 Coflow 2 comp. time = 6 Benefits increases 2 time 4 Coflow 1 comp. time = 5 Coflow 2 comp. time = 6 with the number 6 2 time 4 Coflow 1 comp. time = 3 Coflow 2 comp. time = 6 of coexisting coflows 6

Varys 1 Efficiently schedules coflows leveraging complete and future information 1. The size of

Varys 1 Efficiently schedules coflows leveraging complete and future information 1. The size of each flow, 2. The total number of flows, and 3. The endpoints of individual flows 1. Efficient Coflow Scheduling with Varys, SIGCOMM’ 2014.

Varys Efficiently schedules coflows leveraging complete and future information Pipelining between 1. The size

Varys Efficiently schedules coflows leveraging complete and future information Pipelining between 1. The size of each flow, stages 2. The total number of flows, and Speculative executions 3. The endpoints of individual Task failures and flows restarts

Aalo Efficiently schedules coflows without complete and future information Pipelining between 1. The size

Aalo Efficiently schedules coflows without complete and future information Pipelining between 1. The size of each flow, stages 2. The total number of flows, and Speculative executions 3. The endpoints of individual Task failures and flows restarts

Coflow Scheduling Minimize Avg. Comp. Time With complete knowledge Without complete knowledge Flows on

Coflow Scheduling Minimize Avg. Comp. Time With complete knowledge Without complete knowledge Flows on a Single Link Smallest-Flow-First Least-Attained Service (LAS)

Coflow Scheduling Minimize Avg. Comp. Time With complete knowledge Without complete knowledge Flows on

Coflow Scheduling Minimize Avg. Comp. Time With complete knowledge Without complete knowledge Flows on a Single Link Smallest-Flow-First Least-Attained Service (LAS) 1. Efficient Coflow Scheduling with Varys, SIGCOMM’ 2014. Coflows in the Entire Network Varys 1, Smallest-Coflow-First 1 ?

Coflow Scheduling Minimize Avg. Comp. Time With complete knowledge Without complete knowledge Flows on

Coflow Scheduling Minimize Avg. Comp. Time With complete knowledge Without complete knowledge Flows on a Single Link Smallest-Flow-First Least-Attained Service (LAS) Coflows in the Entire Network Varys 1, Smallest-Coflow-First 1 ? LAS: prioritize flow that has sent the least amount of data 1. Efficient Coflow Scheduling with Varys, SIGCOMM’ 2014.

Coflow-Aware LAS (CLAS) Prioritize coflow that has sent the least total number of bytes

Coflow-Aware LAS (CLAS) Prioritize coflow that has sent the least total number of bytes • The more a coflow has sent, the lower its priority • Smaller coflows finish faster

Coflow-Aware LAS (CLAS) Prioritize coflow that has sent the least total number of bytes

Coflow-Aware LAS (CLAS) Prioritize coflow that has sent the least total number of bytes • The more a coflow has sent, the lower its priority • Smaller coflows finish faster Challenges (also shared by LAS) • Can lead to starvation • Suboptimal for similar size coflows

Suboptimal for Similar Coflows Reduces to fair sharing • Doesn’t minimize average completion time

Suboptimal for Similar Coflows Reduces to fair sharing • Doesn’t minimize average completion time Coflow 1 Coflow 2 2 time 4 6 Coflow 1 comp. time = 6 Coflow 2 comp. time = 6 FIFO works well for similar coflows • Optimal when cflows are identical 2 time 4 6 Coflow 1 comp. time = 3 Coflow 2 comp. time = 6

Between a “Rock” and a “Hard Place” Prioritize across dissimilar coflows FIFO schedule similar

Between a “Rock” and a “Hard Place” Prioritize across dissimilar coflows FIFO schedule similar coflows

Discretized Coflow-Aware LAS (D-CLAS) Lowest. Priority Queue Priority discretization • Change priority when total

Discretized Coflow-Aware LAS (D-CLAS) Lowest. Priority Queue Priority discretization • Change priority when total # of bytes sent exceeds predefined thresholds Scheduling policies • FIFO within the same queue • Prioritization across queue Weighted sharing across queues • Guarantees starvation avoidance FIFO QK … FIFO Q 2 FIFO Q 1 Highest. Priority Queue

How to Discretize Priorities? Lowest. Priority Queue FIFO QK Exponentially spaced thresholds: A×Ei •

How to Discretize Priorities? Lowest. Priority Queue FIFO QK Exponentially spaced thresholds: A×Ei • A, E : constants • 1 ≤ i ≤ K : threshold constant • K : number of the queues ∞ A EK-1 … FIFO Q 2 A E 2 AE FIFO Q 1 AE 0 Highest. Priority Queue

Computing Total # of Bytes Sent D-CLAS requires to know total # of bytes

Computing Total # of Bytes Sent D-CLAS requires to know total # of bytes sent over all flows of a coflow • Distributed computation over small time scales challenging

Computing Total # of Bytes Sent D-CLAS requires to know total # of bytes

Computing Total # of Bytes Sent D-CLAS requires to know total # of bytes sent over all flows of a coflow • Distributed computation over small time scales challenging How much do we loose if we don’t compute total # of bytes sent? • D-LAS: make decisions based on total number of bytes sent locally

D-LAS Far From Optimal! Coflow 2 Coflow 1 6 Units 2 Units Link 2

D-LAS Far From Optimal! Coflow 2 Coflow 1 6 Units 2 Units Link 2 3 Units Link 1 D-LAS (decision on # of bytes sent locally) D-CLAS L 2 L 1 2 time 4 Coflow 1 comp. time = 6 Coflow 2 comp. time = 6 6 2 time 4 Coflow 1 comp. time = 3 Coflow 2 comp. time = 6 6

Aalo Efficiently schedules coflows without complete and future information 1. Implement D-CLAS using a

Aalo Efficiently schedules coflows without complete and future information 1. Implement D-CLAS using a centralized architecture 2. Expose a non-blocking coflow API

Aalo Architecture Coordinator Worker Sender 1 Sender 2 μs milliseconds D-CLAS Worker Network Interface

Aalo Architecture Coordinator Worker Sender 1 Sender 2 μs milliseconds D-CLAS Worker Network Interface Worker Timescale Local/Global Scheduling

Details Non-blocking: when a new coflow arrives at an output port • Put its

Details Non-blocking: when a new coflow arrives at an output port • Put its flow(s) in lowest priority queue and schedule them immediately • No need to sync all flows of a coflow as in Varys

Details Non-blocking: when a new coflow arrives at an output port • Put its

Details Non-blocking: when a new coflow arrives at an output port • Put its flow(s) in lowest priority queue and schedule them immediately • No need to sync all flows of a coflow as in Varys Compute total number of bytes sent • Workers send info about active coflows periodically • Coordinator computes total # of bytes sent, and relay this info back to workers • Workers use this info to move coflows across queues Minimal overhead for small flows

Evaluation 1. Can it approach clairvoyant solutions? 2. Can it scale gracefully? A 3000

Evaluation 1. Can it approach clairvoyant solutions? 2. Can it scale gracefully? A 3000 -machine tracedriven simulation matched against a 100 -machine EC 2 deployment YES

On Par with Clairvoyant Approaches [EC 2] Comm. Improv. Job Improv. Per-Flow Varys 1.

On Par with Clairvoyant Approaches [EC 2] Comm. Improv. Job Improv. Per-Flow Varys 1. 93 X 1. 18 X 0. 89 X 0. 91 X

Performance Breakdown [EC 2] Similar for large coflows because they are in slow-moving queues

Performance Breakdown [EC 2] Similar for large coflows because they are in slow-moving queues Fraction of Coflows 1 0. 5 Varys Aalo Non-Clairvoyant Scheduler 0 0. 01 1 100 Coflow Completion Time (Seconds) Performance loss for medium coflows by mischeduling them Improvements for small coflows by avoiding coordination

1 # (Emulated) Aalo Slaves Coordination Period (Δ) 100 s 1 s 100 ms

1 # (Emulated) Aalo Slaves Coordination Period (Δ) 100 s 1 s 100 ms 10 ms 992 495 Normalized Completion Time of Per. Flow Fairness w. r. t. Aalo 115 100000 50000 17 10000 8 10 100 Average Coordination Time (ms) What About Scalability? [EC 2] 2 1. 8 1. 6 1. 4 1. 2 1 0. 8 0. 6 0. 4 0. 2 0

Aalo Efficiently schedules coflows without complete information • Makes coflows practical in presence of

Aalo Efficiently schedules coflows without complete information • Makes coflows practical in presence of failures and DAGs • Improved performance over flow-based approaches • Provides a simple, non-blocking API https: //github. com/coflow Mosharaf Chowdhury – mosharaf@umich. edu