Flowlet Switching Srikanth Kandula Shan Sinha Dina Katabi
Flowlet Switching Srikanth Kandula Shan Sinha & Dina Katabi
ISPs Want to Split Traffic Across Multiple Paths
ISPs Want to Split Traffic Across Multiple Paths % 70 30% • Load balancing to remove hot spots • Rebalance traffic when unpredictable events occur (Outages, Do. S, BGP reroutes, Flash Crowds, …)
ISPs Want to Split Traffic Across Multiple Paths Unpredictable Traffic Rebalance Traffic % 70 30% • Load balancing to remove hot spots • Rebalance traffic when unpredictable events occur (Outages, Do. S, BGP reroutes, Flash Crowds, …)
ISPs Want to Split Traffic Across Multiple Paths Unpredictable Traffic % 30 70% • Load balancing to remove hot spots • Rebalance traffic when unpredictable events occur (Outages, Do. S, BGP reroutes, Flash Crowds, …)
• Much research on balancing and rebalancing load, • But implementation is hard particularly with dynamic ratios o Either sacrifice accuracy or reorder TCP packets
• Much research on balancing and rebalancing load, • But implementation is hard particularly with dynamic ratios o Either sacrifice accuracy or reorder TCP packets Problem 1. Given the desired split ratios – possibly dynamic 2. Split traffic accurately, at the edge router, without reordering TCP’s packets
Existing Scheme 1: Packet-Based Splitting • Assign packets to paths proportional to the desired ratios Reorders TCP packets causing bad throughput
Existing Scheme 2: Flow-Based Splitting • Assign TCP flows to each path proportional to the desired ratio 1. 2. 3. 4. 5. Flows are not all equal: Elephants & Mice So, estimate the rate of each TCP flow But rates change with time Too complex Very inaccurate if desired ratios change
How to Split Traffic? Packet-Based Flow-Based • Accurate • Inaccurate • Reorders TCP packets • No packet reordering • Easily tracks dynamic ratios • Hard to track if ratios change Can we combine the best of the two approaches?
This Talk • Show to send a single TCP flow down multiple paths without reordering • Accurately split traffic even when desired ratios are dynamic • Easy to implement
Flowlet Switching 1 TCP flow 2 • If the previous packet from the flow has left the merging point Can reassign the flow to a different path
Flowlet Switching Delay = D 1 Given > |D 2 -D 1| Delay = D 2
Flowlet Switching Delay = D 1 Given > |D 2 -D 1| Delay = D 2 Flowlets are bursts from same flow separated by at least ; they can be switched independently! Idle ≥
Implementing Flowlet Switching is Simple Last_Seen (s) Path 9920. 2659 3 hash SRCip DSTip SRCPort DSTPort • Router at the split point hashes packet header • If (Now - Last_Seen) > , flow can change path • Reassign path proportionally to the desired split ratios
Does it Really Work? • Traces collected on a peering link, an edge link and two core links • Split Vectors (3 paths) o o Static (. 3, . 4) Dynamic – sinusoidal with amplitude 60%, period 20 min [Akella 04, Chuah 02]
Error Is Flowlet Switching Accurate?
Error Is Flowlet Switching Accurate? Flowlet switching is much more accurate than flow-based switching
Can do Flowlet Switching without Per-Flow State Fig. shows Avg. and Max. of many traces Errors stabilize for small table 4 16 64 256 1024 2048 4096 8192 Hash Table Entries #Active Flows ~ 50, 000; But… Router maintains a hash table < 1000 entries (5 KB).
Understanding Flowlets
But Where do Flowlets come from? • Can’t be just timeouts or short flows; most of the bytes are in the elephants • Why can a large flow be broken into many small flowlets?
Flowlets exist because TCP is bursty at RTT and sub-RTT scales • Well-known that TCP usually sends a window in one or a few bursts and waits for acks [Zhang 91, Zhang 03, Jiang 04] • Some Reasons Slow-start o Ack compression o Window is much smaller than delay-BW product o
Flowlets exist because TCP is Bursty Most flowlets have inter-arrivals less than an RTT most flowlets are sub-windows
Why Flowlet Switching is Accurate? • 80% of bytes are in flowlets smaller than 10 KB • Assigning a flowlet to a path isn’t a long commitment
Why Flowlets can Track Dynamics? Arrival Rate of both flows and flowlets (/sec) Edge 143. 16 1454. 98 Peering 611. 95 8661. 43 Core 1 3784. 10 35287. 04 Core 2 111. 33 2848. 76 An order of magnitude more opportunities to rebalance!
Why flowlet switching doesn’t need per-flow state?
Why flowlet switching doesn’t need per-flow state? # Active Flowlets Flow 1 Flow 2 Flow 3 3 2 1 0 Time
Why flowlet switching doesn’t need per-flow state? # Active Flowlets Flow 1 Flow 2 Flow 3 3 2 1 0 Time
Why flowlet switching doesn’t need per-flow state? # Active Flowlets Flow 1 Flow 2 Flow 3 3 2 1 0 Time
Why flowlet switching doesn’t need per-flow state? Trace #Active Flowlets Edge 18. 41 Peering 28. 08 Core 1 240. 12 Core 2 50. 66
Why flowlet switching doesn’t need per-flow state? Trace #Active Flowlets #Active Flows Edge 18. 41 1450. 42 Peering 28. 08 8477. 33 Core 1 240. 12 47883. 33 Core 2 50. 66 1559. 33 #Active flowlets is 2 orders of magnitude smaller than flows Very small hash table
Why Flowlet Switching is Possible? • Why can a large flow be broken into many small flowlets? • TCP burstiness at small time scales • Why is flowlet switching accurate? • Small commitment; many more chances to rebalance • Why flowlet switching • Few simultaneously does not need peractive flowlets flow state?
Configuring Flowlet Switching Flowlet separation > delay difference But, how to find delay difference? For our traces which are a diverse collection of traffic within continental US o o ~50 ms is a good and safe choice! Our procedure is a constructive way to find
Flowlet Separation of 50 ms is Good ~50 ms results in accurate splitting Any flowlet timeout in [50, 100] ms yields highly accurate splits
Flowlet Separation of 50 ms is Safe 1%. 8 %. 6 %. 4 %. 2 % 0% Even if delay difference >> 50 ms, prob. of reordering is negligible compared to drop. rate in the Internet (about 1%)
Conclusion • Harness TCP burstiness to split traffic at a finer resolution than a flow without reordering • Flowlet Switching: Splitting errors are a few percents o Reordering probability is negligible compared to drop prob. in the Internet o Easy to implement o • Enable ISPs to do dynamic load balancing
- Slides: 36