OTCP SDNManaged Congestion Control for Data Center Networks

  • Slides: 13
Download presentation
OTCP: SDN-Managed Congestion Control for Data Center Networks Simon Jouet simon. jouet@glasgow. ac. uk

OTCP: SDN-Managed Congestion Control for Data Center Networks Simon Jouet simon. [email protected] ac. uk https: //netlab. dcs. gla. ac. uk School of Computing Science

Background on TCP “For a transport endpoint embedded in a network of unknown topology

Background on TCP “For a transport endpoint embedded in a network of unknown topology and with an unknown, unknowable and constantly changing population of competing conversations, only one scheme has any hope of working – exponential backoff-” Congestion Avoidance and Control, Van Jacobson, 1988 • Conservative Congestion Control Settings • Minimum Retransmission Timeout (RTOmin) 200 ms • Initial Retransmission Timeout (RTOinit) 1 s • Initial Congestion Window (IW) 10 segments IEEE/IFIP NOMS - 26/04/2016 2

Partition Aggregate Traffic Light request to workers • Synchronous replies • Multiple Flows Bottleneck

Partition Aggregate Traffic Light request to workers • Synchronous replies • Multiple Flows Bottleneck link Typical of DC applications • Map. Reduce • Memcached • Apache Spark • … Reply k Query k IEEE/IFIP NOMS - 26/04/2016 3

TCP Throughput Incast Collapse • Many flows share same egress queue • Packet dropped

TCP Throughput Incast Collapse • Many flows share same egress queue • Packet dropped when buffers are full • RTO is used as recovery mechanism • Bursts of traffic separated by long idle period S S S RT O S S IW = 3 Buffer occupancy • Result in low throughput and long flow completion times RTO init (>2 00 ms ) (1 s) S S S RT O 2 x R TO Time IEEE/IFIP NOMS - 26/04/2016 4

DC Networks “[…] a WSC server is deployed in a relatively well-known environment, leading

DC Networks “[…] a WSC server is deployed in a relatively well-known environment, leading to possible optimizations for increased performance. […] lower packet losses than in long-distance Internet connections. Thus we can tune transport or messaging parameters (timeouts, window sizes, etc. ) for higher communication efficiency. ” The Datacenter as a Computer, Luiz André Barroso, Urs Hölzle, 2009 Core Controller Agg 1 G 1 ms 1 G 0. 2 ms To. R 10 x 1 G 0. 1 ms Compute environment specific settings • • RTOmin = Route Latency RTOmax = Route + Buffer latency CWNDmax = Route BDP CWNDinit (IW) = BDP / Flow fan-in x 10 IEEE/IFIP NOMS - 26/04/2016 5

OTCP Information Gathering Add timestamp to topology discovery (OFDP) • Controller – Switch -

OTCP Information Gathering Add timestamp to topology discovery (OFDP) • Controller – Switch - Controller Open. Flow Request/Reply • Controller – Switch - Controller ARP Probe packets • x 10 Controller – Switch – Host – Switch - Controller Port status for link speed Queue config for buffer sizes IEEE/IFIP NOMS - 26/04/2016 6

OTCP Calculations • Controller x 10 IEEE/IFIP NOMS - 26/04/2016 7

OTCP Calculations • Controller x 10 IEEE/IFIP NOMS - 26/04/2016 7

Parameters Propagation • Controller exposes a northbound JSON/REST API • • Agent in the

Parameters Propagation • Controller exposes a northbound JSON/REST API • • Agent in the end-hosts connect to the API endpoint Controller calculate per-route congestion control values Push to agent on topological changes Agent update the host routing table RTT (µs) RTOmin (ms) RTOmax (ms) RTOinit (ms) CWNDmax (MSS) IW (MSS) To. R 629 1 2. 069 4 49 1 Agg 1485 2 5. 805 12 127 2 Core 5571 6 12. 771 25 476 5 IEEE/IFIP NOMS - 26/04/2016 8

OTCP Improvements • Match the congestion control settings to the network RT Oi s)

OTCP Improvements • Match the congestion control settings to the network RT Oi s) S S S 1 m O ( S S IW = 1 Improve Flow completion time Improve Throughput and Goodput Improve Flow fairness Reduce latency jitter RT Buffer occupancy • • nit ( 4 m s) S Time IEEE/IFIP NOMS - 26/04/2016 9

(s) FCT Evaluation (s) (a) Mean FCT (b) 95 th Percentile IEEE/IFIP NOMS -

(s) FCT Evaluation (s) (a) Mean FCT (b) 95 th Percentile IEEE/IFIP NOMS - 26/04/2016 10

Goodput Evaluation CDF of Flow goodput experiencing incast collapse IEEE/IFIP NOMS - 26/04/2016 11

Goodput Evaluation CDF of Flow goodput experiencing incast collapse IEEE/IFIP NOMS - 26/04/2016 11

Conclusion • Implemented OTCP • Centralized controller-based congestion control settings measurement • Calculate per-route

Conclusion • Implemented OTCP • Centralized controller-based congestion control settings measurement • Calculate per-route parameters based on the operating environment • Improve soft-realtime partition-aggregate traffic • 12 x FCT improvement at the mean, 31 x at the 95 th percentile • Low and stable latency, no bursts from the IW • Higher and fairer goodput IEEE/IFIP NOMS - 26/04/2016 12

Questions? IEEE/IFIP NOMS - 26/04/2016 13

Questions? IEEE/IFIP NOMS - 26/04/2016 13