Multipath TCP Costin Raiciu Christoph Paasch University Politehnica

  • Slides: 108
Download presentation
Multipath TCP Costin Raiciu Christoph Paasch University Politehnica of Bucharest Université catholique de Louvain

Multipath TCP Costin Raiciu Christoph Paasch University Politehnica of Bucharest Université catholique de Louvain Joint work with: Mark Handley, Damon Wischik, University College London Olivier Bonaventure, Sébastien Barré, Université catholique de Louvain and many others Thanks to

Networks are becoming multipath Mobile devices have multiple wireless connections

Networks are becoming multipath Mobile devices have multiple wireless connections

Networks are becoming multipath

Networks are becoming multipath

Networks are becoming multipath

Networks are becoming multipath

Networks are becoming multipath Datacenters have redundant topologies

Networks are becoming multipath Datacenters have redundant topologies

Networks are becoming multipath Client Servers are multi-homed

Networks are becoming multipath Client Servers are multi-homed

How do we use these networks? TCP. Used by most applications, offers byte-oriented reliable

How do we use these networks? TCP. Used by most applications, offers byte-oriented reliable delivery, adjusts load to network conditions [Labovits et al – Internet Interdomain traffic – Sigcomm 2010]

TCP is single path A TCP connection Uses a single-path in the network regardless

TCP is single path A TCP connection Uses a single-path in the network regardless of network topology Is tied to the source and destination addresses of the endpoints

Mismatch between network and transport creates problems

Mismatch between network and transport creates problems

Poor Performance for Mobile Users 3 G celltower

Poor Performance for Mobile Users 3 G celltower

Poor Performance for Mobile Users 3 G celltower

Poor Performance for Mobile Users 3 G celltower

Poor Performance for Mobile Users 3 G celltower

Poor Performance for Mobile Users 3 G celltower

Poor Performance for Mobile Users 3 G celltower Offload to Wi. Fi

Poor Performance for Mobile Users 3 G celltower Offload to Wi. Fi

Poor Performance for Mobile Users 3 G celltower All ongoing TCP connections die

Poor Performance for Mobile Users 3 G celltower All ongoing TCP connections die

Collisions in datacenters [Fares et al - A Scalable, Commodity Data Center Network Architecture

Collisions in datacenters [Fares et al - A Scalable, Commodity Data Center Network Architecture - Sigcomm 2008]

Single-path TCP collisions reduce throughput [Raiciu et. Al – Sigcomm 2011]

Single-path TCP collisions reduce throughput [Raiciu et. Al – Sigcomm 2011]

Multipath TCP

Multipath TCP

Multipath TCP (MPTCP) is an evolution of TCP that can effectively use multiple paths

Multipath TCP (MPTCP) is an evolution of TCP that can effectively use multiple paths within a single transport connection • Supports unmodified applications • Works over today’s networks • Standardized at the IETF (almost there)

Multipath TCP components Connection setup Sending data over multiple paths Encoding control information Dealing

Multipath TCP components Connection setup Sending data over multiple paths Encoding control information Dealing with (many) middleboxes Congestion control [Raiciu et. al – NSDI 2012] [Wischik et. al – NSDI 2011]

Multipath TCP components Connection setup Sending data over multiple paths Encoding control information Dealing

Multipath TCP components Connection setup Sending data over multiple paths Encoding control information Dealing with (many) middleboxes Congestion control [Raiciu et. al – NSDI 2012] [Wischik et. al – NSDI 2011]

MPTCP Connection Management SYN LE X B A P A C MP_

MPTCP Connection Management SYN LE X B A P A C MP_

MPTCP Connection Management SYN/A CK MP_C APAB LE Y

MPTCP Connection Management SYN/A CK MP_C APAB LE Y

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO SYN JOIN Y FLOW

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO SYN JOIN Y FLOW Y

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO ACK SYN/ X JOIN

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO ACK SYN/ X JOIN FLOW Y

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y SUBFLOW 2

MPTCP Connection Management SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

TCP Packet Header Bit 0 Bit 15 Bit 16 Source Port Bit 31 Destination

TCP Packet Header Bit 0 Bit 15 Bit 16 Source Port Bit 31 Destination Port Sequence Number Acknowledgment Number Header Reserved Code bits Length Receive Window Checksum Urgent Pointer Options Data 20 Bytes 0 - 40 Bytes

TCP Packet Header Bit 0 Bit 15 Bit 16 Source Port Bit 31 Destination

TCP Packet Header Bit 0 Bit 15 Bit 16 Source Port Bit 31 Destination Port Sequence Number Acknowledgment Number Header Reserved Code bits Length Receive Window Checksum Urgent Pointer Options Data 20 Bytes 0 - 40 Bytes

Sequence Numbers Packets go multiple paths. – Need sequence numbers to put them back

Sequence Numbers Packets go multiple paths. – Need sequence numbers to put them back in sequence. – Need sequence numbers to infer loss on a single path. Options: – One sequence space shared across all paths? – One sequence space per path, plus an extra one to put data back in the correct order at the receiver?

Sequence Numbers • One sequence space per path is preferable. – Loss inference is

Sequence Numbers • One sequence space per path is preferable. – Loss inference is more reliable. – Some firewalls/proxies expect to see all the sequence numbers on a path. • Outer TCP header holds subflow sequence numbers. – Where do we put the data sequence numbers?

MPTCP Packet Header Bit 0 Bit 15 Bit 16 Subflow. Source Port Bit 31

MPTCP Packet Header Bit 0 Bit 15 Bit 16 Subflow. Source Port Bit 31 Destination Port Subflow Sequence Number Subflow Acknowledgment Number Subflow Header Reserved Code bits Length Receive Window Checksum Urgent Pointer Data sequence number Options Data ACK 20 Bytes 0 - 40 Bytes

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd.

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd.

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd.

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ 5000 … DSEQ 11000 DATA SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd.

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ 5000 … DSEQ 11000 DATA SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd.

MPTCP Operation options … SEQ 1000 … DSEQ 10000 DATA SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ 5000 … DSEQ 11000 DATA SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation … ACK 2000 Data ACK 11000 … SUBFLOW 1 CWND Snd. SEQNO

MPTCP Operation … ACK 2000 Data ACK 11000 … SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ 5000 … DSEQ 11000 DATA SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ

MPTCP Operation SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ 5000 … DSEQ 11000 DATA SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ

MPTCP Operation SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y options … SEQ 5000 … DSEQ 11000 DATA SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

MPTCP Operation options … SEQ 2000 … DSEQ 11000 DATA SUBFLOW 1 CWND Snd.

MPTCP Operation options … SEQ 2000 … DSEQ 11000 DATA SUBFLOW 1 CWND Snd. SEQNO Rcv. SEQNO FLOW Y SUBFLOW 2 CWND Snd. SEQNO Rcv. SEQNO

Multipath TCP Congestion Control

Multipath TCP Congestion Control

Packet switching ‘pools’ circuits. Multipath ‘pools’ links TCP controls how a link is shared.

Packet switching ‘pools’ circuits. Multipath ‘pools’ links TCP controls how a link is shared. How should a pool be shared? Two circuits A link Two separate links 42 A pool of links

Design goal 1: Multipath TCP should be fair to regular TCP at shared bottlenecks

Design goal 1: Multipath TCP should be fair to regular TCP at shared bottlenecks A multipath TCP flow with two subflows Regular TCP fair, Multipath. TCPshouldtakeasasmuchcapacityasas To. Tobebefair, TCPatata abottlenecklink, nonomatterhow howmanypaths subflows TCP it is using. 43

Design goal 2: 44 MPTCP should use efficient paths 12 Mb/s To be fair,

Design goal 2: 44 MPTCP should use efficient paths 12 Mb/s To be fair, Multipath TCP should take as much capacity as Each flow has a choice of a 1 -hop and a 2 -hop path. TCP at a bottleneck link, no matter how many paths it is How using. should split its traffic?

Design goal 2: 45 MPTCP should use efficient paths 12 Mb/s 8 Mb/s To

Design goal 2: 45 MPTCP should use efficient paths 12 Mb/s 8 Mb/s To be fair, Multipath TCP should take as much capacity as If each split its traffic. . . how many paths it is TCP at aflow bottleneck link, no 1: 1 matter using.

Design goal 2: 46 MPTCP should use efficient paths 12 Mb/s 9 Mb/s To

Design goal 2: 46 MPTCP should use efficient paths 12 Mb/s 9 Mb/s To be fair, Multipath TCP should take as much capacity as If each split its traffic. . . how many paths it is TCP at aflow bottleneck link, no 2: 1 matter using.

Design goal 2: 47 MPTCP should use efficient paths 12 Mb/s 10 Mb/s To

Design goal 2: 47 MPTCP should use efficient paths 12 Mb/s 10 Mb/s To be fair, Multipath TCP should take as much capacity as • TCPIf at each flow split its traffic 4: 1. . . how many paths it is a bottleneck link, no matter using.

Design goal 2: 48 MPTCP should use efficient paths 12 Mb/s 12 Mb/s To

Design goal 2: 48 MPTCP should use efficient paths 12 Mb/s 12 Mb/s To be fair, Multipath TCP should take as much capacity as • TCPIf at each flow split its traffic ∞: 1. . . how many paths it is a bottleneck link, no matter using.

Design goal 3: 49 MPTCP should get at least as much as TCP on

Design goal 3: 49 MPTCP should get at least as much as TCP on the best path wifi path: high loss, small RTT 3 G path: low loss, high RTT Design Goal 2 says to. TCP send all your on the least as To be fair, Multipath should taketraffic as much capacity congested path, in this 3 G. Buthow thismany has high RTT, TCP at a bottleneck link, case no matter paths it is hence it will give low throughput. using.

How does TCP congestion control work? Maintain a congestion window w. Increase w for

How does TCP congestion control work? Maintain a congestion window w. Increase w for each ACK, by 1/w Decrease w for each drop, by w/2 50

How does MPTCP congestion control work? Maintain a congestion window wr, one window for

How does MPTCP congestion control work? Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths. Increase wr for each ACK on path r, by Decrease wr for each drop on path r, by wr /2 51

How does MPTCP congestion control work? Maintain a congestion window wr, one window for

How does MPTCP congestion control work? Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths. Increase wr for each ACK on path r, by Goal 2 Decrease wr for each drop on path r, by wr /2 52

How does MPTCP congestion control work? Maintain a congestion window wr, one window for

How does MPTCP congestion control work? Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths. Increase wr for each ACK on path r, by Goals 1&3 Decrease wr for each drop on path r, by wr /2 53

How does MPTCP congestion control work? Maintain a congestion window wr, one window for

How does MPTCP congestion control work? Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths. Increase wr for each ACK on path r, by Decrease wr for each drop on path r, by wr /2 54

Applications of Multipath TCP

Applications of Multipath TCP

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 56 2 TCPs @ 50 Mb/s 100 Mb/s 4 TCPs @ 25 Mb/s

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 57 2 TCPs @ 33 Mb/s 1 MPTCP @ 33 Mb/s 4 TCPs @ 25 Mb/s 100 Mb/s

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 58 2 TCPs @ 25 Mb/s 2 MPTCPs @ 25 Mb/s 100 Mb/s 4 TCPs @ 25 Mb/s The total capacity, 200 Mb/s, is shared out evenly between all 8 flows.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 59 2 TCPs @ 22 Mb/s 3 MPTCPs @ 22 Mb/s 100 Mb/s 4 TCPs @ 22 Mb/s The total capacity, 200 Mb/s, is shared out evenly between all 9 flows. It’s as if they were all sharing a single 200 Mb/s link. The two links can be said to form a 200 Mb/s pool.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 60 2 TCPs @ 20 Mb/s 4 MPTCPs @ 20 Mb/s 100 Mb/s 4 TCPs @ 20 Mb/s The total capacity, 200 Mb/s, is shared out evenly between all 10 flows. It’s as if they were all sharing a single 200 Mb/s link. The two links can be said to form a 200 Mb/s pool.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 61 5 TCPs 100 Mb/s First 0, then 10 MPTCPs 100 Mb/s throughput per flow [Mb/s] 15 TCPs We confirmed in experiments that MPTCP nearly manages to pool the capacity of the two access links. Setup: two 100 Mb/s access links, 10 ms delay, first 20 flows, then 30. time [min]

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.

At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly. 62 5 TCPs 100 Mb/s First 0, then 10 MPTCPs 100 Mb/s 15 TCPs MPTCP makes a collection of links behave like a single large pool of capacity — i. e. if the total capacity is C, and there are n flows, each flow gets throughput C/n.

Multipath TCP can pool datacenter networks Instead of using one path for each flow,

Multipath TCP can pool datacenter networks Instead of using one path for each flow, use many random paths Don’t worry about collisions. Just don’t send (much) traffic on colliding paths

Multipath TCP in data centers

Multipath TCP in data centers

Multipath TCP in data centers

Multipath TCP in data centers

MPTCP better utilizes the Fat. Tree network

MPTCP better utilizes the Fat. Tree network

MPTCP on EC 2 • Amazon EC 2: infrastructure as a service – We

MPTCP on EC 2 • Amazon EC 2: infrastructure as a service – We can borrow virtual machines by the hour – These run in Amazon data centers worldwide – We can boot our own kernel • A few availability zones have multipath topologies – 2 -8 paths available between hosts not on the same machine or in the same rack – Available via ECMP

Amazon EC 2 Experiment • 40 medium CPU instances running MPTCP • For 12

Amazon EC 2 Experiment • 40 medium CPU instances running MPTCP • For 12 hours, we sequentially ran all-to-all iperf cycling through: – TCP – MPTCP (2 and 4 subflows)

MPTCP improves performance on EC 2 Same Rack

MPTCP improves performance on EC 2 Same Rack

Implementing Multipath TCP in the Linux Kernel

Implementing Multipath TCP in the Linux Kernel

Linux Kernel MPTCP About 10000 lines of code in the Linux Kernel Initially started

Linux Kernel MPTCP About 10000 lines of code in the Linux Kernel Initially started by Sébastien Barré Now, 3 actively working on Linux Kernel MPTCP Christoph Paasch Fabien Duchêne Gregory Detal Freely available at http: //mptcp. info. ucl. ac. be

MPTCP-session creation

MPTCP-session creation

Application creates regular TCPsockets

Application creates regular TCPsockets

MPTCP-session creation

MPTCP-session creation

The Kernel creates the Meta-socket

The Kernel creates the Meta-socket

MPTCP creating new subflows

MPTCP creating new subflows

The Kernel handles the different MPTCP subflows

The Kernel handles the different MPTCP subflows

MPTCP Performance with apache 100 simultaneous HTTP-Requests, total of 100000

MPTCP Performance with apache 100 simultaneous HTTP-Requests, total of 100000

MPTCP Performance with apache 100 simultaneous HTTP-Requests, total of 100000

MPTCP Performance with apache 100 simultaneous HTTP-Requests, total of 100000

MPTCP Performance with apache 100 simultaneous HTTP-Requests, total of 100000

MPTCP Performance with apache 100 simultaneous HTTP-Requests, total of 100000

MPTCP on multicore architectures Flow-to-core affinity steers all packets from one TCP-flow to the

MPTCP on multicore architectures Flow-to-core affinity steers all packets from one TCP-flow to the same core. MPTCP has lots of L 1/L 2 cache-misses because the individual subflows are steered to different CPU-cores

MPTCP on multicore architectures

MPTCP on multicore architectures

MPTCP on multicore architectures Solution: Send all packets from the same MPTCP-session to the

MPTCP on multicore architectures Solution: Send all packets from the same MPTCP-session to the same CPU-core Based on Receive-Flow-Steering implementation in Linux (Author: Tom Herbert from Google)

MPTCP on multicore architectures

MPTCP on multicore architectures

Multipath TCP on Mobile Devices

Multipath TCP on Mobile Devices

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

TCP over Wi. Fi/3 G

TCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

MPTCP over Wi. Fi/3 G

Wi. Fi to 3 G handover with Multipath TCP A mobile node may lose

Wi. Fi to 3 G handover with Multipath TCP A mobile node may lose its Wi. Fi connection. Regular TCP will break! Some applications support recovering from a broken TCP (HTTP-Header Range) Thanks to the REMOVE_ADDR-option, MPTCP is able to handle this without the need for application support.

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Wi. Fi to 3 G handover with Multipath TCP

Related Work Multipath TCP has been proposed many times before – First by Huitema

Related Work Multipath TCP has been proposed many times before – First by Huitema (1995), CMT, p. TCP, M-TCP, … You can solve mobility differently – At different layer: Mobile IP, HTTP range – At transport layer: Migrate TCP, SCTP You can deal with datacenter collisions differently – Hedera (Openflow + centralized scheduling)

Multipath topologies need multipath transport Multipath TCP can be used by unchanged applications over

Multipath topologies need multipath transport Multipath TCP can be used by unchanged applications over today’s networks MPTCP moves traffic away from congestion, making a collection of links behave like a single pooled resource

Backup Slides

Backup Slides

Packet-level ECMP in datacenters

Packet-level ECMP in datacenters

How does MPTCP congestion control work? 107 Maintain a congestion window wr, one window

How does MPTCP congestion control work? 107 Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths. Increase wr for each ACK on path r, by Design goals 1&3: At any potential bottleneck S that path r might be in, look at the best that a single-path TCP could get, and compare to what I’m getting. Decrease wr for each drop on path r, by wr /2

How does MPTCP congestion control work? 108 Maintain a congestion window wr, one window

How does MPTCP congestion control work? 108 Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths. Design goal 2: We want to shift traffic away from congestion. Increase wr for each ACK on path r, by To achieve this, we increase windows in proportion to their size. Decrease wr for each drop on path r, by wr /2