Netbench Testing network devices with reallife traffic patterns
Netbench Testing network devices with real-life traffic patterns CHEP 2018 S. Stancu (stefan. stancu@cern. ch), A. Krajewski, M. Cadeddu, M. Antosik, B. Panzer-Steindel
Outline • The problem: evaluate network devices • • Test approaches Netbench • • • Design Statistics visualization Sample results Netbench - CHEP 2018 Stefan Stancu 2
The problem Network performance is key • Datacentre • Campus • Etc. N S W E Netbench - CHEP 2018 Selecting new devices evaluation is crucial: • Required features • Required performance • Traffic patterns Stefan Stancu 3
Luxurious approach • Tst • Use one tester port for each device port Cost explosion (tester port $$$) • • Tst • DUT Tst • Partial mesh Full mesh Buffering only lightly exercised • • • Line-rate Packet size scan Forwarding of traffic with complex distributions [3] • • Tst Exercises all ports • • Tst N tester ports (tens or few hundreds) RFC tests [1][2] create synthetic reproducible traffic patterns Synchronized traffic generation, with minimal congestion Such tests are rarely performed and are likely biased • • Manufacturers rent equipment Third party tests on manufacturers demand partial mesh Tst = Tester DUT = Device Under Test Netbench - CHEP 2018 Stefan Stancu full mesh 4
Typical approach (affordable) – snake test Sequential • path Tst DUT • Tst • • • DUT Random path • Tst = Tester DUT = Device Under Test Netbench - CHEP 2018 only 2 tester ports Exercises all ports • • Tst Use 2 tester ports Loop back traffic Contained cost (tester port $$$): • • Tst Snake test Line-rate Packet size scan Forwarding on a simple linear paths • • Sequential paths Easy to predict & optimize Random path “Impossible” to predict Buffering is not exercised • No congestion due to linear path Stefan Stancu 5
Netbench • Use commodity servers and NICs • • Manageable cost • • Partial mesh Full mesh Buffering exercised • • • Mostly maximum size packets, similar with real-life Forwarding of traffic with complex distributions [3] • • • N server NIC ports – (tens or few hundreds) Time-share the servers Exercises all ports • DUT Orchestrate TCP flows (e. g. iperf 3) Multiple TCP flows, similar with real-life traffic Congestion due to competing TCP flows A reasonable size testbed becomes affordable Tst = Tester DUT = Device Under Test Netbench - CHEP 2018 Stefan Stancu 6
Netbench design DUT Agent Traffic statistics iperf 3 Web server {REST} XML-RPC Central DUT = Device Under Test Netbench - CHEP 2018 Stefan Stancu 7
Graphs/plots – expected flows • Plot the diff between expected and seen flows: • Goal: flat 0 (i. e. all flows have correctly started) Netbench - CHEP 2018 Stefan Stancu 8
Graphs/plots – per node BW • Plot the per node overall TX/RX bandwidth • Goal: flat on all nodes (fair treatment of all the nodes) Netbench - CHEP 2018 Stefan Stancu 9
Graphs/plots – per-node flow BW • Plot the per node average bandwidth (and stdev) • Goal: flat on all nodes, and small stdev Netbench - CHEP 2018 Stefan Stancu 10
Graphs/plots – per-pair flow BW • Plot the per pair average bandwidth • Goal: flat colour (no hot/cold spots) Netbench - CHEP 2018 Stefan Stancu 11
Sample results • • CERN has tendered (RFP) for datacentre routers TCP flow fairness evaluation • Netbench with 64 x 40 G NICs Netbench - CHEP 2018 Stefan Stancu 12
Free running TCP • Clear unfairness pattern • • TCP streams compete freely device buffers fully exercised • • Correlated with the device internal architecture TCP needs to see drops to back-off Netbench provides good insights on the DUT buffering and internal architecture. DUT = Device Under Test Netbench - CHEP 2018 Stefan Stancu 13
Capped TCP window* • • All nodes achieve ~line-rate Stable flat flow distribution • • small stdev Good measure for TCP flow fairness • When network congestion is controlled * Capped TCP window: iperf 3 -w 64 k Netbench - CHEP 2018 Stefan Stancu 14
Summary • Netbench – affordable, large-scale testing of network devices • • • Traffic patterns closely resemble real-life conditions Exercises the device buffering capabilities (congestion handling) Specialized HW “snake” Netbench Test @line-rate Packet size scan Full mesh traffic Test buffering Cost (large scale) prohibitive compromise affordable Contact stefan. stancu@cern. ch if interested in Netbench • some manufacturers expressed interest Netbench - CHEP 2018 Stefan Stancu 15
References • • [1] RFC 2544, Bradner, S. and Mc. Quaid J. , "Benchmarking Methodology for Network Interconnect Devices" [2] RFC 2889, Mandeville, R. and Perser J. , "Benchmarking Methodology for LAN Switching Devices“ [3] RFC 2285, Mandeville, R. , "Benchmarking Terminology for LAN Switching Devices" [4] iperf 3 http: //software. es. net/iperf/ Netbench - CHEP 2018 Stefan Stancu 16
Netbench - CHEP 2018 Stefan Stancu 17
Backup material Servers tuning for fair flows Netbench - CHEP 2018 Stefan Stancu 18
Iperf and irq affinity [1] CPU affinity No Yes IRQ affinity No No Yes Netbench - CHEP 2018 Stefan Stancu 19
Iperf and irq affinity [2] CPU affinity No Yes IRQ affinity No No Yes Netbench - CHEP 2018 Stefan Stancu 20
- Slides: 20