Improve Performance of Kubeproxy and GTPU using VPP

Improve Performance of Kube-proxy and GTP-U using VPP Hongjun Ni (hongjun. ni@intel. com) Danny Zhou (danny. zhou@intel. com) Johnson Li (johnson. li@intel. com) Network Platform Group, DCG, Intel Acknowledgement: Jianfeng Tan

Agenda § Enabling high performance kube-proxy § Six ways to improve performance of GTP-U § Key takeaway 2

Introducing Vector Packet Processor • Packet processing development platform • Runs on commodity CPUs and leverages DPDK • Runs as a Linux user-space application

VPP Graph Node and Plugin Framework … dpdk-input Packet Vector ethernetinput mpls-ethernetinput ip 4 -lookup ip 4 -loadbalance ip 4 rewrite ethernetoutput arp-input ip 6 -input ip 4 -udplookup ip 4 -local Kube-proxy GTP-U

Enabling high performance kube-proxy § Introducing kube-proxy § Current pain point § Proposed kube-proxy solution § Kube-proxy dataplane § Go. VPP § Performance 5

Introducing kube-proxy • Watches addition and removal of Service and Endpoints. • Installs iptables rules • Captures traffic and select Pod • Redirects traffic Reference: https: //kubernetes. io/docs/concepts/services-networking/service/

Current Pain Point § Supports userspace and iptables § Uses kernel iptables NAT § Performance degrades when service/endpoint pairs increase 7

Proposed kube-proxy solution CP: Control Plane DP: Data Plane

Kube-proxy Data Plane • Distribute traffic evenly • Per flow stick to a Pod • Three service types: (1). Cluster. IP: Port (2). Node. IP: Node. Port (3). External Load. Balancer

Kube-proxy Data Plane Internals Supported Graph Nodes udplookup Other input Graph nodes kp 4 -nat 4 kp 4 -nat 6 kp 6 -nat 4 kp 6 -nat 6 kp 4 nodeport kp 6 nodeport ip 4 -lookup ip-lookup Other input Graph nodes ip 6 -lookup VIP->Pod Table Stores VIP->Pod session

Kube-proxy: External Load. Balancer

Go. VPP Reference: https: //wiki. fd. io/images/f/fa/Go. VPP-intro. pdf • Golang toolset for VPP management • VPP binary API (JSON) Go Structure • Handle 250, 000 binary API requests per second

Performance Kube-proxy Throughput For 64 -Byte Packet 12 9. 83 Throughput (Mpps) 10 • Linux iptables perf: < 400 kpps • Scaling 8 6. 62 6 4 2 0 Phy NIC vhost-user Test Case Input packet size (bytes) Output packet size (bytes) Load balance + DNAT + Routing 64 64 Reference: https: //people. netfilter. org/kadlec/nftest. pdf

Six Ways to Improve Performance of GTP-U § Cache table lookup result § Bypass second ip-input § Bypass first route lookup § Dual-loop and Quad-loop § Packet prefetching § Bypass second route lookup 14

GTP-U Plugin Internals • Typical GTP-U packet processed by VPP and GTP-U plugin Outer MAC header Outer IP header UDP header GTP-U header Inner IP header L 4 header Payload • GTP-U Plugin Supported Graph Nodes ip-lookup udp-lookup gtpu-input Other input Graph nodes gtpu-encap IP 4 -input gtpu-bypass Other input Graph nodes IP 6 -input GTP-U tunnel GTP-U Plugin Stores PDP context

Device Under Test for Performance • Hardware Configuration • Network Topology CPU Traffic Gen (Trex) Port A Intel(R) Xeon(R) CPU E 5 -2699 v 3 @ 2. 30 GHz DIMM 2133 MHz, 64 GB Total NIC 2 x 82599 ES 10 -Gigabit SFI/SFP+ Network Connection Port B Packet. Gen Ixia* 10 Gigabit Ethernet Traffic Generator • Software Configuration Port A Port B Ubuntu 16. 04. 2 LTS Core 0 OS DUT Kernel Linux version 4. 4. 0 -62 -generic DPDK 17. 08 (VPP + GTP-U) Legend Flow A Flow B • BIOS Configuration Enhanced Intel Speedstep Enabled Turbo Boost Enabled Processor C 3 Disabled Processor C 6 Disabled Hyper-Threading Disabled Intel VT-d Enabled CPU Power and Performance Policy Performance Memory Freq. 2133 MHz Total Memory Size 64 GB Memory RAS and Performance Configuration -> NUMA Optimized ENABLED QPI B/W 9. 6 GT/s MLC Streamer ENABLED MLC Spatial Prefetcher ENABLED VPP 17. 10 -rc 0 DCU Data Prefetcher GTP-U 17. 10 -rc 0 DCU Instruction Prefetcher ENABLED Direct Cache Access (DCA) ENABLED 16 ENABLED

Cache table lookup result

Bypass second ip-input … dpdk-input Packet Vector ethernetinput ip 4 -lookup ip 6 -input ip 4 -local ip 4 -loadbalance ip 4 rewrite ethernetoutput arp-input ip 4 -udplookup gtpu-decap ip 4 -lookup ip 4 -input • Pros: Boost performance • Cons : Security issue

Bypass first route lookup … dpdk-input Packet Vector • gtpu packet: accelerate decap processing • none-gtpu packet: an overhead with about 13 clocks per packet ethernetinput ip 4 -input none-gtpu packet ip 4 -lookup gtpu-bypss ip 6 -input ip 4 -local arp-input ip 4 -udplookup gtpu packet ip 4 -loadbalance ip 4 rewrite ethernetoutput gtpu-decap ip 4 -lookup ip 4 -input

GTPU-Decap Performance and Analysis GTPU-Decap Throughput For 64 -Byte Packet 12 10. 4 Throughput (Mpps) 10 8. 32 8 7. 29 6. 84 6 4 2 0 raw cache table lookup bypss ip-input bypass route lookup Test Case Input packet size (bytes) Output packet size (bytes) Transport IP Routing + GTPU-Decap + Inner IP Routing 98 64

Dual-loop and Quad-loop, Packet Prefetching • Reduce read latency • Process packets in parallel

Bypass second route lookup … dpdk-input Packet Vector ethernetinput ip 4 -lookup ip 6 -input ip 4 -rewrite ip 4 -loadbalance ip 4 rewrite ethernetoutput arp-input gtpu-encap ip 4 -lookup

GTPU-Encap Performance and Analysis GTPU-Encap Throughput For 64 -Byte Packet 8. 6 8. 48 Throughput (Mpps) 8. 4 8. 26 8. 2 8. 13 8 7. 85 7. 8 7. 76 7. 4 One-loop Dual-loop Quad-loop w Prefetch Bypass route Test Case Input packet size (bytes) Output packet size (bytes) Inner IP Routing + GTPU-Encap + Transport Routing 64 114

Key Takeaway § Easy-to-use and flexible VPP plugin framework § Kube-proxy plugin to boost DP’s performance in Cloud environment § Combination of a set of ways to optimize data plane performance § Better platform for developing open source dataplane ingredients 24

Q&A
- Slides: 25