AllPath Bridging Update Jun Tanaka Fujitsu Labs Ld
All-Path Bridging Update Jun Tanaka (Fujitsu Labs. Ld. ) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas Networks) IEEE Plenary meeting Atlanta 7 -10 Nov.
Contents • • All-Path Basics Issues Report of All-Path Demos Report of proposal to AVB WG 2021/10/31 1
Problem Statement IEEE 802. 1 D RSTP has following limitations; – – – Not all the links cannot be used The shortest path might not be used anytime No multipath available Root bridge tends to be high load Not scalable 2021/10/31 2
Objectives • To overcome RSTP limitation – – – – Loop free All links to be used TRILL Provide shortest path Provide multipath Compatible with 802. 1 D/Q No new tag or new frame to be defined Zero configuration 2021/10/31 SPB 3
All-Path Basics (One-way) 1 4 S 3 2 5 D ARP_req S Port locked to S D Port locked to D 4
All-Path Basics (One-way) The first received port is locked to S. - Register S to a table - Start lock timer - Learn S at the port The later received port discard the frame S. - Check S w/ the table if the lock timer effective 1 S X ARP_req 4 X S S 2 3 ARP_req S Port locked to S D Port locked to D 5
All-Path Basics (One-way) 1 S S 4 X X X S S 2 3 S 5 D ARP_req S Port locked to S D Port locked to D 6
All-Path Basics (Two-way) 1 If DA is on the FDB Unicast forwarding same as 802. 1 d S S 4 S S 2 3 D S 5 D D ARP_reply S Port locked to S D Port locked to D 7
All-Path Basics (Two-way) 1 S S 4 S 2 D S 3 D S 5 D D ARP_reply S Port locked to S D Port locked to D 8
Needful Things 802. 1 D • Forwarding Database (Large: ex. 16 k~ entry) • Aging timer (long: ex. 300 s) + All-Path • First-come table (small: ex. ~1 k entry) • Lock timer (short: ex. ~1 s) • Filtering logic (late-come frames) 2021/10/31 9
Minimum aging time of Lock timer FP FP SP x FP x SP First port received (learning) (forwarding, learning, classification, tagging, queuing etc. ) FP: First Port, SP: Second Port Processing time (forwarding, learning, classification, tagging, queuing etc. ) First port received (learning) Processing time FP The minimum aging time x Second port received (discarding) The aging timer shall be valid to discard this frame as received from the second port x Second port received (discarding) The First-come table aging time shall be longer than 2 x (one-way link delay + processing delay) If it is for Data center, it can be less than 1 ms. 2021/10/31 10
Scope of All-Path Both support, loop free, shortest path SPB, ECMP TRILL Enterprise, Campus, Small datacenter Manageability Home network etc. LAN MAN/WAN ALL-PATH l. Simple l Less operation l Natural load balance Large area, provider network Large datacenter etc. RSTP/MSTP Scalability 2021/10/31 11
Issues 1. Path recovery 2. Server edge 3. Load balance 2021/10/31 12
1. Path Recovery – Original idea 1 S S 4 ARP_req S 2 S 3 S 5 D D • Mechanism: When unknown unicast frame arriving at bridge with failed link, path fail message is generated per MAC entry towards source bridge, that generates corresponding ARP to re-establish tree. • Question: If 10 K MAC entries are existed in FDB, 10 K path fail frames should be generated, is it feasible processing for local CPU, especially in high-speed link (ex. 10 GE)? 13
1. Path Recovery – Original idea 1 Path_fail S S 4 D S 2 3 S Port locked to S D Port locked to D 5 D D 14
1. Path recovery – Selective flush a a ” sh “b flu 3 a 1 MAC=a flush “b” 2 1 SW 5 (Fujitsu) 2 flush message is terminated because “b” is not binded to port 1 a 1 b 2 SW 1 1 SW 3 b 2 SW 2 Delete entry “b” from FDB and re-sends the flush message to SW 1. a 1 flush “b” b 2 SW 4 a 1 3 SW 6 b 2 MAC=b May includes two or more…ex. 100 s of MAC addresses to be flushed as a list. l When link failure is detected, MAC flush lists are flooded. 54 frames (187 MAC / 1500 B frame) for 10 K MAC entry. l Avoid unnecessary flooding, MAC entries are deleted to shorten. l Issues: How to prevent flush frame loss. May require CPU processing power. l Experience: 15 ms to flush 10 K MACs in a node (1 GHz MIPS Core) 2021/10/31 15
1. Path Recovery - Loop back(UAH) • Low processing at failed (link) bridges: loopback is part of the standard forwarding table • Processing load is distributed among source edge bridges involved in flows. Only one side (SA>DA) asks for repair. • Resiliency: If first packet looped back is lost the other following looped back frames will follow.
2. Server Edge • Question: If a server has two or more NICs, how to find which port is first? • vswitch: only vswitch to support All-Path • VEB: both VEB and vswitch to support All-Path • VEPA: only external switch to support All-Path Vswitch NIC Vswitch VEB NIC VEPA NIC Ext. switch 2021/10/31 17
3. Load Balance (Fujitsu) SW 3 SW 4 SW 5 10000 80000000 60000000 40000000 20000000 0 SW 1 SW 2 SW 3 0 5, 6 11, 2 16, 8 22, 4 28 33, 6 39, 2 44, 8 50, 4 56 61, 6 67, 2 72, 8 78, 4 84 89, 6 95, 2 100, 8 106, 4 112 117, 6 123, 2 128, 8 134, 4 SW 2 120000000 Throughput SW 1 140000000 Elapsed time • Load balance is available in natural way because high load link tend not to be selected with queuing delay. • Pros: zero-configuration load balance • Cons: you cannot control load balance like SPB/ECMP 2021/10/31 18
Load Distribution (UAH simulations) • Objectives: – – • Topology: – Links subset of a Small Data Center topology to show path selection at core – Core links capacity is lower (100 Mbps) to force load distribubtion and congestion only at core Queues support up to 100. 000 frames (so that they affect as delay and not discarding frames) – • Explain native load distribution results of Singapore presentation Visualize how the on-demand path selection avoids the loaded links Traffic: stepped in sequence, left to right – – – Green servers send UDP packets towards red servers Groups of 25 servers initiate the communication every second. The first one at second 1, the second at second 2, second 3, …. And finally, the last group is a single server that starts the communication at second 4 of the simulation. UDP packets (1 packet every 1 ms, simultaneous for all servers). The packet size varies between 90 and 900 bytes in the different simulations to simulate increasing traffic loads.
Simulation I – UDP packet size: 90 B x 25 1 s x 25 S 3 2 s x 25 3 s x 25 4 s x 25 x 1 # flows 51 1 1 24 0 24 S 4 x 25 S 1 S 2 x 25 Server Group 1 2 3 4 x 25 Paths s 3 -s 4 and s 3 -s 2 -s 4 s 3 -s 1 -s 4 Note the path s 3 -s 4 is reused several times because is still not so loaded (low traffic)
Simulation I – UDP packet size: 300 B # flows 26 14 14 36 0 36 Server Group 1 2 3 4 S 3 S 4 S 1 S 2 Paths s 3 -s 4 s 3 -s 1 -s 4 and s 3 -s 2 -s 4 s 3 -s 4 Note the path s 3 -s 4 is not reused when the 2 nd group starts, but instead uses s 3 -s 1 -s 4 and s 3 -s 2 -s 4, similar with the 3 rd group, the 4 rd reuses s 3 -s 4 because it’s again the fastest once s 1 and s 2 are loaded because of groups 2 and 3
Simulation I – UDP packet size: 900 B x 25 1 s x 25 S 3 2 s x 25 3 s x 25 4 s x 25 x 1 # flows 26 25 25 25 0 25 S 4 x 25 S 1 S 2 x 25 Server Group 1 2 3 4 x 25 Paths s 3 -s 4 s 3 -s 1 -s 4 s 3 -s 2 -s 4 s 3 -s 4 900 B means some frames are being discarded at queues (too much traffic). Group 1 chooses s 3 -s 4 and fully loads it, 2 chooses s 3 -s 1 -s 4 and same happens, 3 chooses s 3 -s 2 -s 4 and same, when 4 starts, every link (except the one from s 1 -s 2) is fully loaded, so s 3 -s 4 is again the fastest path and is chosen.
Load distribution conclusions • Notice how the # of flows gets distributed in the links in the core when the traffic increases due to increased latency. – Load distribution starts with low loads – Path diversity increases with load • Similar balancing effect observed in redundant links from an access switch to two core switches • On demand path selection finds paths adapted to current, instantaneous conditions, not to past or assumed traffic matrix
Report on Proposal for AVB TG • May 12, Thu, morning session @ AVB • Dr. Ibanez presented the materials as used in IW session (Singapore and Santa Fe) • Questions and comments – – Any other metric than latency e. g. bandwidth? Path recovery time comparing with RSTP? Any broadcast storm occurred when link failed? What’s the status in IW session, any PAR created? • AVB status – They try to solve by their own way, using SRP. – Not only latency but also bandwidth can be used as metric – Also redundant path can be calculated 2021/10/31 24
Path Selection with SRP at-phkl-SRP-Stream-Path-Selection-0311 -v 01. pdf 2021/10/31 25
REPORT OF ALL PATH DEMOS - TORONTO: SIGCOM AUGUST 2011 - BONN: LCN OCTOBER 2011
Demo at Sigcom 2011 • HW Net. FPGA implementation • Four Net. FPGAs (4*1 Gbps) • Demo: • Zero configuration • Video streaming, high throughput. • Robustness, no frame loops • Fast path recovery • Internet connection, std hosts • http: //conferences. sigcomm. org/sigcomm/2011/papers/sig comm/p 444. pdf
Demo at IEEE LCN 2011 (october, Bonn) Openflow and Linux (Open. WRT) ALL Path switches NOX Openflow controller Ethernet switch
Demo at IEEE LCN 2011 (october, Bonn) Openflow and Linux (Open. WRT) ALL Path switches • One NEC switch splitted into 4 Openflow switches • Four Soekris boards as 4 Openflow switches • Two Linksys WRT routers running ARP Path over Linux implementation • Video streaming and internet access without host changes • – Some video limitations at Open. WRT routers – Smooth operation on Soekris and NEC. Reference: A Small Data Center Network of ARP-Path Bridges Made of Openflow Switches. Guillermo Ibáñez (UAH); Jad Naous (MIT/Stanford Univ. ) ; Elisa Rojas (UAH); Bart De Schuymer (Art in Algorithms, Belgium); Thomas Dietz (NEC Europe Ltd. , Germany)
Feedback from All Path-UAH demos • At every demo most people an explanation of how ARP Path works (video available was shown) • Intrigued about the mechanism, and interest on the reconfiguration of flows and native loop avoidance • Amount of state stored per bridge: per host or per bridge. (Encapsulating versions Q-in-Q, M-in-M are possible, but not the target, already covered by SPB) • Questions on compatibility and miscibility with standard bridges (automatic core-island mode, no full miscibility) • Collateral questions on Net. FPGA and on LCN demo topology • Next step : Implementation on a commodity Ethernet Switch (FPGA) (Chip/Switch manufacturers are invited to provide a switch platform) and implementation of interoperability with 802. 1 D bridges in Linux version
Conclusions • All Path bridging is a reality – A new class of transparent low latency bridges • Do not compute, find the path by direct probing Zero configuration Robust, loop free Native load distribution Paths non predictable, but resilient, paths adapt to traffic and traffic is not predictable • Low latency • •
- Slides: 32