The Universal Fast Dataplane A Fast Data Network

  • Slides: 42
Download presentation
The Universal Fast Dataplane

The Universal Fast Dataplane

A Fast Data Network Platform For Native Cloud Network Services Most Efficient on the

A Fast Data Network Platform For Native Cloud Network Services Most Efficient on the Planet Superior Performance Flexible and Extensible Cloud Native Open Source EFFICIENCY The most efficient software packet Processing on the planet PERFORMANCE FD. io on x 86 servers outperforms specialized packet processing HW SOFTWARE DEFINED NETWORKING Software programmable, extendable and flexible CLOUD NETWORK SERVICES Foundation for cloud native network services LINUX FOUNDATION Open source collaborative project in Linux Foundation Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel® Xeon® Server !!! 2

FD. io: The Universal Fast Dataplane • Project at Linux Foundation • FD. io

FD. io: The Universal Fast Dataplane • Project at Linux Foundation • FD. io Scope • Multi-party • Multi-project • Software Dataplane • • • High throughput Low Latency Feature Rich Resource Efficient Bare Metal/VM/Container Multiplatform • Network IO – NIC/v. NIC <-> cores/threads • Packet Processing – Classify / Transform / Prioritize / Forward / Terminate • Dataplane Management Agents – Control Plane Bare Metal/VM/Container Dataplane Management Agent Packet Processing Network IO fd. io Foundation 3

Fd. io in the overall stack Application Layer/App Server Orchestration Network Controller Data Plane

Fd. io in the overall stack Application Layer/App Server Orchestration Network Controller Data Plane Services Dataplane Management Agent Packet Processing Network IO Operating System Hardware fd. io Foundation 4

Multiparty: Broad Membership Service Providers Network Vendors Chip Vendors Integrators fd. io Foundation 5

Multiparty: Broad Membership Service Providers Network Vendors Chip Vendors Integrators fd. io Foundation 5

Multiparty: Broad Contribution Qiniu Yandex Universitat Politècnica de Catalunya (UPC) fd. io Foundation 6

Multiparty: Broad Contribution Qiniu Yandex Universitat Politècnica de Catalunya (UPC) fd. io Foundation 6

Code Activity • In the period since its inception, fd. io has more commits

Code Activity • In the period since its inception, fd. io has more commits than OVS and DPDK combined, and more contributors than OVS 2016 -02 -11 to 2017 -04 -03 Fd. io OVS DPDK Commits 6283 2395 3289 Contributors 163 146 245 Organizations 42 52 78 Commits 10000 Contributors 300 100 200 50 100 0 0 Commits fd. io OVS DPDK fd. io Foundation Organizations Contributors fd. io OVS DPDK 0 Organizations fd. io OVS DPDK 7

Multiproject: Fd. io Projects Dataplane Management Agent Testing/Support hc 2 vpp Honeycomb CSIT puppet-fdio

Multiproject: Fd. io Projects Dataplane Management Agent Testing/Support hc 2 vpp Honeycomb CSIT puppet-fdio Packet Processing NSH_SFC ONE TLDK CICN odp 4 vpp VPP Sandbox trex VPP Network IO deb_dpdk rpm_dpdk fd. io Foundation 8

Fd. io Integrations Openstack Neutron Control Plane Integration work done at ODL Plugin Fd.

Fd. io Integrations Openstack Neutron Control Plane Integration work done at ODL Plugin Fd. io Plugin GBP app Lispflowmapping app VBD app SFC Netconf/Yang Data Plane LISP Mapping Protocol Netconf/yang Honeycomb REST Fd. io ML 2 Agent VPP fd. io Foundation 9

Vector Packet Processor - VPP • Packet Processing Platform Bare Metal/VM/Container Dataplane Management Agent

Vector Packet Processor - VPP • Packet Processing Platform Bare Metal/VM/Container Dataplane Management Agent Packet Processing Network IO fd. io Foundation • • High performance Linux User space Run’s on commodity CPUs: / / Shipping at volume in server & embedded products since 2004. 10

Aside [1/4]: Computer Evolution For Packet Processing It started with ALU … (2) Became

Aside [1/4]: Computer Evolution For Packet Processing It started with ALU … (2) Became universal and faster … (1) It started simple … An Arithmetic Logic Unit (ALU) A simple generic purpose computer (3) Then it got much faster … (4) … but far from simple ! The pipeline of a modern high-performance XEON CPU* A modern two CPU socket server B A D A C * Intel Top-Down Microarchitecture Analysis Method for tuning applications, https: //software. intel. com/en-us/top-down-microarchitecture-analysis-method-win

Aside [2/4]: Computer Evolution For Packet Processing … and we arrived to modern multi-socket

Aside [2/4]: Computer Evolution For Packet Processing … and we arrived to modern multi-socket COTS server … (2) Became universal and faster … (1) It started simple … An Arithmetic Logic Unit (ALU) A simple generic purpose computer (3) Then it got much faster … (4) … but far from simple ! The pipeline of a modern high-performance XEON CPU* A modern two CPU socket server B A D A C * Intel Top-Down Microarchitecture Analysis Method for tuning applications, https: //software. intel. com/en-us/top-down-microarchitecture-analysis-method-win

Aside [2/4]: Computer Evolution For Packet Processing … and we arrived to modern multi-socket

Aside [2/4]: Computer Evolution For Packet Processing … and we arrived to modern multi-socket COTS server … (1) It started simple … (2) Became universal and faster … Four main functional dimensions important for processing packets: An Arithmetic Logic Unit (ALU) A simple generic purpose computer A. CPUs executing the program(s): a) Minimize Instructions per Packet – Efficient software logic to perform needed packet operations. b) Maximize Instructions per CPU core clock cycle – Execution efficiency of an underlying CPU micro-architecture. B. Memory bandwidth: Minimize memory bandwidth utilization – Memory access is slow. C. Network I/O bandwidth: Make efficient use of PCIe I/O bandwidth – It is a limited resource. D. Inter-socket transactions: Minimize cross-NUMA connection utilization – It slows things down. Hint: Start with optimizing the use of CPU micro-architecture => Use vectors ! (3) Then it got much faster … (4) … but far from simple ! The pipeline of a modern high-performance XEON CPU* A modern two CPU socket server B A D A C * Intel Top-Down Microarchitecture Analysis Method for tuning applications, https: //software. intel. com/en-us/top-down-microarchitecture-analysis-method-win

VPP Architecture: Vector Packet Processing Packet 0 1 2 3 …n Vector of n

VPP Architecture: Vector Packet Processing Packet 0 1 2 3 …n Vector of n packets dpdk-input vhost-user-input … af-packet-input Input Graph Node ethernet-input ip 6 -rewrite ip 6 -input ip 4 -input ip 6 -lookup ip 4 -lookup ip 6 -local ip 4 -local mpls-input ip 4 -rewrite Graph Node … arp-input Packet Processing Graph

VPP Architecture: Splitting the Vector Packet 0 1 2 3 …n Vector of n

VPP Architecture: Splitting the Vector Packet 0 1 2 3 …n Vector of n packets dpdk-input vhost-user-input … af-packet-input Input Graph Node ethernet-input ip 6 -rewrite ip 6 -input ip 4 -input ip 6 -lookup ip 4 -lookup ip 6 -local ip 4 -local mpls-input ip 4 -rewrite Graph Node … arp-input Packet Processing Graph

VPP Architecture: Plugins Packet 0 1 2 3 Hardware Plugin hw-accel-input Vector of n

VPP Architecture: Plugins Packet 0 1 2 3 Hardware Plugin hw-accel-input Vector of n packets dpdk-input vhost-user-input … af-packet-input Input Graph Node ethernet-input Skip software nodes where work is done by hardware already ip 6 -input ip 4 -input mpls-input ip 6 -lookup ip 4 -lookup ip 6 -local ip 4 -local … arp-input Plugin custom-1 ip 4 -rewrite Packet Processing Graph Node /usr/lib/vpp_plugins/foo. so ip 6 -rewrite …n custom-2 custom-3 Plugins are: First class citizens That can: Add graph nodes Add API Rearrange the graph Can be built independently of VPP source tree

Aside [3/4]: Computer Evolution For Packet Processing … we then optimize software for networkloads.

Aside [3/4]: Computer Evolution For Packet Processing … we then optimize software for networkloads. . . (2) Became universal and faster … (1) It started simple … An Arithmetic Logic Unit (ALU) A simple generic purpose computer (3) Then it got much faster … (4) … but far from simple ! The pipeline of a modern high-performance XEON CPU* A modern two CPU socket server B A D A C * Intel Top-Down Microarchitecture Analysis Method for tuning applications, https: //software. intel. com/en-us/top-down-microarchitecture-analysis-method-win

Aside [3/4]: Computer Evolution For Packet Processing … we then optimize software for networkloads.

Aside [3/4]: Computer Evolution For Packet Processing … we then optimize software for networkloads. . . (2) Became universal and faster … (1) It started simple … efficiency • Networkloads are very different from • Packet processing An Arithmetic Logic is Unitessential (ALU) A simple generic purpose computer compute ones • Moving packets • They are all about processing packets, at rate. • At 10 GE, 64 B packets can arrive at 14. 88 Mpps => 67 nsec per packet. • With 2 GHz CPU core clock cycle is 0. 5 nsec => 134 clock cycles per packet. • To access memory it takes ~70 nsec => too slow to do it per packet ! (3) Then it got much faster … • Packets arrive on physical interfaces (NICs) and virtual interfaces (VNFs) - need CPU optimized drivers for both. • Drivers and buffer management software must not rely on memory access – see time budget above, MUST use CPU core caching hierarchy well. • Processing packets • Need packet processing optimized for CPU platforms. • Header manipulation, encaps/decaps, lookups, classifiers, counters. (4) … but far from simple ! The pipeline of a modern high-performance XEON CPU* A modern two CPU socket server B A D A C * Intel Top-Down Microarchitecture Analysis Method for tuning applications, https: //software. intel. com/en-us/top-down-microarchitecture-analysis-method-win

Aside [3/4]: Computer Evolution For Packet Processing … we then optimize software for networkloads.

Aside [3/4]: Computer Evolution For Packet Processing … we then optimize software for networkloads. . . (1) It started simple … (2) Became universal and faster … efficiency • Networkloads are very different from • Packet processing An Arithmetic Logic is Unitessential (ALU) A simple generic purpose computer compute ones • Moving packets • They are all about processing packets, at rate. • At 10 GE, 64 B packets can arrive at 14. 88 Mpps => 67 nsec per packet. • With 2 GHz CPU core clock cycle is 0. 5 nsec => 134 clock cycles per packet. • To access memory it takes ~70 nsec => too slow to do it per packet ! (3) Then it got much faster … • Packets arrive on physical interfaces (NICs) and virtual interfaces (VNFs) - need CPU optimized drivers for both. • Drivers and buffer management software must not rely on memory access – see time budget above, MUST use CPU core caching hierarchy well. • Processing packets • Need packet processing optimized for CPU platforms. • Header manipulation, encaps/decaps, lookups, classifiers, counters. AND => (4 a) … let’s make it Simple Again! CPU Socket A modern two CPU socket server B CPU Cores A rxd 2 Memory Controller A 6 5 1 Memory Channels 8 7 D DDR SDRAM 4 9 10 txd 13 LLC packet 3 12 No-to-Minimal memory bandwidth per packet. 11 PCIe C NICs Core operations NIC packet operations NIC descriptor operations

Aside [4/4]: Computer Evolution For Packet Processing … and use FD. io VPP to

Aside [4/4]: Computer Evolution For Packet Processing … and use FD. io VPP to make them fast for packets. (2) Became universal and faster … Packet Processing AVector simple generic purpose computer software worker thread (3) Then it got much faster … (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (1) It started simple … a packet. Core writes Rx descriptor in preparation for receiving NIC reads Rx descriptor to get ctrl flags and buffer address. An Arithmetic Logic Unit (ALU) NIC writes the packet. NIC writes Rx descriptor. Core reads Rx descriptor (polling or irq or coalesced irq). Core reads packet header to determine action. Core performs action on packet header. Core writes packet header (MAC swap, TTL, tunnel, foobar. . ) Core reads Tx descriptor. Core writes Tx descriptor and writes Tx tail pointer. NIC reads Tx descriptor. NIC reads the packet. NIC writes Tx descriptor. AND => (4 a) … let’s make it Simple Again! CPU Socket A modern two CPU socket server B CPU Cores A rxd 2 Memory Controller A 6 5 1 Memory Channels 8 7 D DDR SDRAM 4 9 10 txd 13 LLC packet 3 12 No-to-Minimal memory bandwidth per packet. 11 PCIe C NICs Core operations NIC packet operations NIC descriptor operations

Aside [4/4]: Computer Evolution For Packet Processing … and use FD. io VPP to

Aside [4/4]: Computer Evolution For Packet Processing … and use FD. io VPP to make them fast for packets. (1) It started simple … a packet. (1) Core writes Rx descriptor in preparation for receiving Packet Processing (2) NIC reads Rx descriptor to get ctrl flags and buffer address. An Arithmetic Logic Unit (ALU) AVector simple generic purpose computer software worker thread (3) NIC writes the packet. (4) NIC writes Rx descriptor. Core reads Rx descriptor (polling or irq or coalesced irq). ü VPP software worker threads run on(5)CPU cores. (6) Core reads packet header to determine action. ü Use local caching with No-to-Minimal bandwidth perheader. packet. (7) memory Core performs action on packet (8) and Core smart writes packet header (MAC swap, TTL, tunnel, foobar. . ) ü Get speed with predictive prefetching algos. (9) Core reads Tx descriptor. ü And make CPU cache hierarchy always =>Tx. Packet atpointer. rate. (10) “Hot” Core writes descriptorprocessing and writes Tx tail (11) NIC reads Tx descriptor. (12) NIC reads the packet. (13) NIC writes Tx descriptor. (2) Became universal and faster … Steps 1 -to-13 in a Nutshell: Making VPP simply tick the A-B-C-D server optimization points ! (3) Then it got much faster … AND => (4 a) … let’s make it Simple Again! CPU Socket A modern two CPU socket server B CPU Cores A rxd 2 Memory Controller A 6 5 1 Memory Channels 8 7 D DDR SDRAM 4 9 10 txd 13 LLC packet 3 12 No-to-Minimal memory bandwidth per packet. 11 PCIe C NICs Core operations NIC packet operations NIC descriptor operations

 • Let’s look at performance data at scale • Packet throughput for •

• Let’s look at performance data at scale • Packet throughput for • • IPv 4 routing, IPv 6 routing, L 2 switching with VXLAN tunnelling.

VPP Universal Fast Dataplane: Performance at Scale [1/2] Per CPU core throughput with linear

VPP Universal Fast Dataplane: Performance at Scale [1/2] Per CPU core throughput with linear multi-thread(-core) scaling Topology: Phy-VS-Phy Hardware: Cisco UCS C 240 M 4 Intel® C 610 series chipset 2 x Intel® Xeon® Processor E 5 -2698 v 3 (16 cores, 2. 3 GHz, 40 MB Cache) 2133 MHz, 256 GB Total 6 x 2 p 40 GE Intel XL 710=12 x 40 GE Software Linux: Ubuntu 16. 04. 1 LTS Kernel: ver. 4. 4. 0 -45 -generic FD. io VPP: VPP v 17. 01 -5~ge 234726 (DPDK 16. 11) Resources 1 physical CPU core per 40 GE port Other CPU cores available for other services and other work 20 physical CPU cores available in 12 x 40 GE seupt Lots of Headroom for much more throughput and features

VPP Universal Fast Dataplane: Performance at Scale [2/2] Per CPU core throughput with linear

VPP Universal Fast Dataplane: Performance at Scale [2/2] Per CPU core throughput with linear multi-thread(-core) scaling Topology: Phy-VS-Phy Hardware: Cisco UCS C 240 M 4 Intel® C 610 series chipset 2 x Intel® Xeon® Processor E 5 -2698 v 3 (16 cores, 2. 3 GHz, 40 MB Cache) 2133 MHz, 256 GB Total 6 x 2 p 40 GE Intel XL 710=12 x 40 GE Software Linux: Ubuntu 16. 04. 1 LTS Kernel: ver. 4. 4. 0 -45 -generic FD. io VPP: VPP v 17. 01 -5~ge 234726 (DPDK 16. 11) Resources 1 physical CPU core per 40 GE port Other CPU cores available for other services and other work 20 physical CPU cores available in 12 x 40 GE seupt Lots of Headroom for much more throughput and features

Native Cloud Network Services – Sample Use Cases Based on supported FD. io VPP

Native Cloud Network Services – Sample Use Cases Based on supported FD. io VPP functionality SECURE PRIVATE NETWORKING SD-WAN and DC Overlays • IPVPN, L 2 VPN, IPSec/SSL encryption • Scaleable L 2 switching and IPv 4/IPv 6 routing v. NFs Million Routes Scale and Service Features • Performance at Max Network I/O of Intel® Xeon® Server PRODUCTION GRADE SECURE PRIVATE NETWORKING SUBSCRIBER MANAGEMENT CG-NAT and Softwires • Carrier Grade NAT for Subscriber IPv 4 Addressing Control • Softwires for Subscriber IPv 4 over IPv 6 Transport Million of Subscribers and Service Features • Performance at Max Network I/O of Intel® Xeon® Server Substantial Performance and Efficiency Gains vs. Alternatives SUBSCRIBER MANAGEMENT

SD-WAN – with FD. io Universal Fast Data Plane FD. io Fast Network Data

SD-WAN – with FD. io Universal Fast Data Plane FD. io Fast Network Data Plane Software Defined – WAN SERVICE VIEW v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 1 v. Router v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 2 v. Router v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto PHYSICAL VIEW Enterprise 3 Site 1 Enterprise 1 v. Router v. NF Site N HQ-to-HQ Branch-to-Branch Secure and Fast Private Networking HQ-to-Private. Cloud Branch-to-Private. Cloud Home. User-to-Private. Cloud IP Network Private or Public Site 1 Enterprise 2 Enterprise 3 Site N *Overlay encapsulations: VXLAN, LISP GPE Server CPUs Net I/O Crypto I/O SD-WAN Hub Data Center 1 Server CPUs Net I/O Crypto I/O FD. io SD-WAN Service Properties: A. Native cloud network data plane B. Fast and Efficient C. Over 20 GE of net I/O per CPU core CPUs Net I/O Crypto I/O SD-WAN Hub Data Center 2

SD-WAN – with FD. io Universal Fast Data Plane FD. io Fast Network Data

SD-WAN – with FD. io Universal Fast Data Plane FD. io Fast Network Data Plane Software Defined – WAN SERVICE VIEW v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 1 v. Router v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 2 v. Router v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 3 v. Router v. NF Host-1 – Server-SKL PHYSICAL VIEW 2 CPU Network I/O 480 Gbps Crypto I/O 100 Gbps Host-2 – Server-SKL 2 x 4 x 4 x 1 x 10 GE 25 GE 100 GE 1 -4 25 GE 5 -8 100 GE 9 IPv 4/v 6 Sites 1. . N 25 GE 2 x 100 GE *Overlay encapsulations: VXLAN, LISP GPE IP Network Private or Public IPVPN Service Traffic Simulator Traffic Generator 25 GE 2 CPU Network I/O 480 Gbps Crypto I/O 100 Gbps x 2 100 GE x 2 4 x 4 x 1 x 10 GE 25 GE 100 GE 1 -4 25 GE 5 -8 100 GE 9 IPv 4/v 6 Services

SD-WAN – with FD. io Universal Fast Data Plane Fast SD-WAN Services – with

SD-WAN – with FD. io Universal Fast Data Plane Fast SD-WAN Services – with FD. io Fast Network Data Plane Software Defined – WAN SERVICE VIEW v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 1 v. Router v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 2 v. Router v. NF Services Sites 1. . N IPVPN and L 2 VPN Overlays*, IPSec/SSL Crypto Enterprise 3 v. Router v. NF Host-1 – Server-SKL PHYSICAL VIEW 2 CPU Network I/O 480 Gbps Crypto I/O 100 Gbps Host-2 – Server-SKL 4 x 4 x 4 x 1 x 10 GE 25 GE 100 GE 1 -4 25 GE 5 -8 100 GE 9 IPv 4/v 6 Sites 1. . N 25 GE 2 x 100 GE IP Network Private or Public IPVPN Service Traffic Simulator Traffic Generator 25 GE 2 CPU Network I/O 480 Gbps Crypto I/O 100 Gbps x 4 100 GE x 2 4 x 4 x 1 x 10 GE 25 GE 100 GE 1 -4 25 GE 5 -8 100 GE 9 IPv 4/v 6 Services *Overlay encapsulations: VXLAN, LISP GPE Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel® Xeon® Server !!! 28

Scaling Up The Packet Throughput with FD. io VPP Can we squeeze more from

Scaling Up The Packet Throughput with FD. io VPP Can we squeeze more from a single 2 RU server ? 1. Today’s Intel® XEON® CPUs (E 5 v 3/v 4): a. Per socket have 40 lanes of PCIe Gen 3 b. 2 x 160 Gbps of packet I/O per socket 2. Tomorrow’s Intel® XEON® CPUs: a. Per socket support More lanes of PCIe Gen 3 b. 2 x 280 Gbps of packet I/O per socket VPP enables linear multi-thread(-core) scaling up to the packet I/O limit per CPU => on a path to one terabit software router (1 TFR). Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel® Xeon® Server !!! 29

VPP Architecture: Programmability Example: Honeycomb Architecture Control Plane Protocol Request Message 900 k request/s

VPP Architecture: Programmability Example: Honeycomb Architecture Control Plane Protocol Request Message 900 k request/s Netconf/Restconf/Yang Request Message Linux Hosts Shared Memory … … Agent Request Queue … VPP Response Queue Honeycomb Agent Request Queue … VPP Response Queue Can use C/Java/Python/or Lua Language bindings Async Response Message fd. io Foundation Async Response Message 30

Universal Fast Dataplane: Features Hardware Platforms Pure Userspace - X 86, ARM 32/64, Power

Universal Fast Dataplane: Features Hardware Platforms Pure Userspace - X 86, ARM 32/64, Power Raspberry Pi Interfaces DPDK/Netmap/AF_Packet/Tun. Tap Vhost-user - multi-queue, reconnect, Jumbo Frame Support Language Bindings C/Java/Python/Lua Tunnels/Encaps GRE/VXLAN-GPE/LISP-GPE NSH, L 2 TPv 3, SRv 6 IPSec Tunnel, Transport Including HW offload if present MPLS over Ethernet/GRE Deep label stacks supported Routing IPv 4/IPv 6 14+ MPPS, single core Hierarchical FIBs Multimillion FIB entries Source RPF Thousands of VRFs Controlled cross-VRF lookups Multipath – ECMP and Unequal Cost IP Multicast Segment Routing SR MPLS/IPv 6 Including Multicast LISP x. TR/RTR L 2 Overlays over LISP and GRE encaps Multitenancy Multihome Map/Resolver Failover Source/Dest control plane support Map-Register/Map-Notify/RLOC-probing IPSec transport mode Switching VLAN Support Single/ Double tag L 2 forwd w/EFP/Bridge. Domain concepts VTR – push/pop/Translate (1: 1, 1: 2, 2: 1, 2: 2) Mac Learning – default limit of 50 k addr Bridging Split-horizon group support/EFP Filtering Proxy Arp termination IRB - BVI Support with Router. Mac assigmt Flooding Input ACLs Interface cross-connect L 2 GRE over IPSec tunnels Security Mandatory Input Checks: TTL expiration header checksum L 2 length < IP length ARP resolution/snooping ARP proxy NAT Ingress Port Range Filtering Per interface whitelists Policy/Security Groups/GBP (Classifier) Network Services DHCPv 4 client/proxy DHCPv 6 Proxy MAP/LW 46 – IPv 4 aas CGNAT Mag. Lev-like Load Balancer Identifier Locator Addressing NSH SFC SFF’s & NSH Proxy LLDP BFD Qo. S Policer 1 R 2 C, 2 R 3 C Multiple million Classifiers – Arbitrary N-tuple Inband i. OAM Telemetry export infra (raw IPFIX) i. OAM for VXLAN-GPE (NGENA) SRv 6 and i. OAM co-existence i. OAM proxy mode / caching i. OAM probe and responder Monitoring Simple Port Analyzer (SPAN) IP Flow Export (IPFIX) Counters for everything Lawful Intercept 31

Rapid Release Cadence – ~3 months 16 -02 Fd. io launch 16 -06 Release-

Rapid Release Cadence – ~3 months 16 -02 Fd. io launch 16 -06 Release- VPP 16 -06 New Features Enhanced Switching & Routing IPv 6 SR multicast support LISP x. TR support VXLAN over IPv 6 underlay per interface whitelists shared adjacencies in FIB Improves interface support vhost-user – jumbo frames Netmap interface support AF_Packet interface support Improved programmability Python API bindings Enhanced JVPP Java API bindings Enhanced debugging cli Hardware and Software Support for ARM 32 targets Support for Raspberry Pi Support for DPDK 16. 04 16 -09 Release: VPP, Honeycomb, NSH_SFC, ONE 16 -09 New Features Enhanced LISP support for L 2 overlays Multitenancy Multihoming Re-encapsulating Tunnel Routers (RTR) support Map-Resolver failover algorithm New plugins for SNAT Mag. Lev-like Load Identifier Locator Addressing NSH SFC SFF’s & NSH Proxy Port range ingress filtering Dynamically ordered subgraphs 17 -01 Release: VPP, Honeycomb, NSH_SFC, ONE 17 -01 New Features Hierarchical FIB Performance Improvements DPDK input and output nodes L 2 Path IPv 4 lookup node IPSEC Softwand HWCrypto Support HQo. S support Simple Port Analyzer (SPAN) BFD IPFIX Improvements L 2 GRE over IPSec tunnels LLDP LISP Enhancements Source/Dest control plane L 2 over LISP and GRE Map-Register/Map-Notify RLOC-probing ACL Flow Per Packet SNAT – Multithread, Flow Export LUA API Bindings 32

New in 17. 04 – Released Apr 19 VPP Userspace Host Stack TCP stack

New in 17. 04 – Released Apr 19 VPP Userspace Host Stack TCP stack DHCPv 4 relay multi-destination DHCPv 4 option 82 DHCPv 6 relay multi-destination DHPCv 6 relay remote-id ND Proxy NAT CGN: Configurable port allocation CGN: Configurable Address pooling CPE: External interface DHCP support NAT 44, NAT 64, LW 46 Segment Routing v 6 Stateful Security Groups Routed interface support L 4 filters with IPv 6 Extension Headers SR policies with weighted SID lists Binding SID SR steering policies SR Local. SIDs Framework to expand local SIDs w/plugins API Move to CFFI for Python binding Python Packaging improvements CLI over API Improved C/C++ language binding i. OAM UDP Pinger w/path fault isolation IOAM as type 2 metadata in NSH IOAM raw IPFIX collector and analyzer Anycast active server selection IPFIX Collect IPv 6 information Per flow state Release notes: https: //docs. fd. io/vpp/17. 04/release_notes_1704. html Images at: https: //nexus. fd. io/ 33

Continuous Quality, Performance, Usability Built into the development process – patch by patch Submit

Continuous Quality, Performance, Usability Built into the development process – patch by patch Submit Build/Unit Testing 120 Tests/Patch Build binary packaging for Ubuntu 14. 04 Ubuntu 16. 04 Centos 7 Automated Style Checking Unit test : IPFIX IPv 6 BFD IP Multicast Classifier L 2 FIB DHCP L 2 Bridge Domain FIB MPLS GRE SNAT IPv 4 SPAN IPv 4 IRB VXLAN IPv 4 multi-VRF Automated Verify Code Review System Functional Testing 252 Tests/Patch DHCP – Client and Proxy GRE Overlay Tunnels L 2 BD Ethernet Switching L 2 Cross Connect Ethernet Switching LISP Overlay Tunnels IPv 4 -in-IPv 6 Softwire Tunnels Cop Address Security IPSec IPv 6 Routing – NS/ND, RA, ICMPv 6 u. RPF Security Tap Interface Telemetry – IPFIX and Span VRF Routed Forwarding i. ACL Security – Ingress – IPv 6/Mac IPv 4 Routing Qo. S Policer Metering VLAN Tag Translation VXLAN Overlay Tunnels Merge Performance Testing 144 Tests/Patch, 841 Tests L 2 Cross Connect L 2 Bridging IPv 4 Routing IPv 6 Routing IPv 4 Scale – 20 k, 200 k, 2 M FIB Entries IPv 4 Scale - 20 k, 200 k, 2 M FIB Entries VM with vhost-userr PHYS-VPP-VM-VPP-PHYS L 2 Cross Connect/Bridge VXLAN w/L 2 Bridge Domain IPv 4 Routing COP – IPv 4/IPv 6 whiteless i. ACL – ingress IPv 4/IPv 6 ACLs LISP – IPv 4 -o-IPv 6/IPv 6 -o-IPv 4 VXLAN Qo. S Policer L 2 Cross over L 2 Bridging Publish Artifacts Usability Merge-by-merge: apt installable deb packaging yum installable rpm packaging autogenerated code documentation autogenerated cli documentation Per release: autogenerated testing reports report perf improvements Puppet modules Training/Tutorial videos Hands-on-usecase documentation Merge-by-merge packaging feeds Downstream consumer CI pipelines Run on real hardware in fd. io Performance Lab 34

Universal Fast Dataplane: Infrastructure Bare Metal Cloud/NFVi Container Infra Server VM FD. io Kernel/Hypervisor

Universal Fast Dataplane: Infrastructure Bare Metal Cloud/NFVi Container Infra Server VM FD. io Kernel/Hypervisor VM VM FD. io Kernel/Hypervisor Con Con FD. io Kernel 35

Universal Fast Dataplane: x. NFs FD. io based v. NFs FD. io based c.

Universal Fast Dataplane: x. NFs FD. io based v. NFs FD. io based c. NFs Server VM FD. io Con FD. io Kernel/Hypervisor 36

Universal Fast Dataplane: Embedded Device Smart. Nic Device Server Kernel/Hypervisor FD. io Kernel/Hypervisor Hw

Universal Fast Dataplane: Embedded Device Smart. Nic Device Server Kernel/Hypervisor FD. io Kernel/Hypervisor Hw Accel Smart. Nic FD. io Hw Accel 37

Universal Fast Dataplane: CPE Example Physical CPE v. CPE in a VM v. CPE

Universal Fast Dataplane: CPE Example Physical CPE v. CPE in a VM v. CPE in a Container Device Server VM FD. io Con FD. io Kernel/Hypervisor Hw Accel 38

A Fast Data Network Platform For Native Cloud Network Services Most Efficient on the

A Fast Data Network Platform For Native Cloud Network Services Most Efficient on the Planet Superior Performance Flexible and Extensible Cloud Native Open Source EFFICIENCY The most efficient software packet Processing on the planet PERFORMANCE FD. io on x 86 servers outperforms specialized packet processing HW SOFTWARE DEFINED NETWORKING Software programmable, extendable and flexible CLOUD NETWORK SERVICES Foundation for cloud native network services LINUX FOUNDATION Open source collaborative project in Linux Foundation Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel® Xeon® Server !!! 39

Opportunities to Contribute • Firewall We invite you to Participate in fd. io •

Opportunities to Contribute • Firewall We invite you to Participate in fd. io • IDS • Get the Code, Build the Code, Run the Code • DPI • Try the vpp user demo • Flow and user telemetry • Hardware Accelerators • Install vpp from binary packages (yum/apt) • Container Integration • Install Honeycomb from binary packages • Integration with Open. Cache • Read/Watch the Tutorials • Control plane – support your favorite SDN Protocol Agent • Join the IRC Channels • Join the Mailing Lists • Test tools • Explore the wiki • Cloud Foundry Integration • Join fd. io as a member • Packaging • Testing FD. io git repos: https: //git. fd. io/vpp/ https: //git. fd. io/csit/ fd. io Foundation FD. io project wiki pages: https: //wiki. fd. io/view/Main_Page https: //wiki. fd. io/view/VPP https: //wiki. fd. io/view/CSIT 40

Q&A

Q&A

Thank You !

Thank You !