Building Compilers for Reconfigurable Switches Lavanya Jose Lisa
Building Compilers for Reconfigurable Switches Lavanya Jose, Lisa Yan, Nick Mc. Keown, and George Varghese Research funded by AT&T, Intel, Open Networking Research Center. 1
In the next 20 minutes • Fixed-function switch chips will be replaced by reconfigurable switch chips • We will program them using languages like P 4 • We need a compiler to compile P 4 programs to reconfigurable switch chips. 2
L 3 L 2 Stage IPv 4 Stage IPv 6 Stage ACL Stage Queues Packet Parser Packet L 2 Fixed-Function Switch Chips 3
Control Flow Graph v 4 L 2 ACL v 6 Control Flow Graph Stage Fixed Action IPv 6 ACL Table Stage IPv 6 Table Fixed Action Stage IPv 4 Table Fixed Action L 2 Table Parser Switch Pipeline Queues Stage 4
Fixed-Function Switch Chips Are Limited 1. Can’t add new forwarding functionality 2. Can’t add new monitoring functionality 5
Fixed-Function Switch Chips My. Encap L 2 v 4 ACL v 6 Control Flow Graph Stage Fixed Action IPv 6 ACL Table Stage Action Fixed Action Stage IPv 4 Table Fixed Action L 2 Table Parser ? IPv 6 Table Switch Pipeline Queues Stage 6
Fixed-Function Switch Chips Are Limited Fixed Action Stag e ACL Table Stag e IPv 6 Table Action Fixed Action Stag e IPv 4 Table Fixed Action L 2 Table Parser 1. Can’t add new forwarding functionality 2. Can’t add new monitoring functionality 3. Can’t move resources between functions Queues Stag e 7
Action Macro Fixed Action v 6 Match Table ACL Table Action Macro Fixed Action L 2 IPv 6 Table Match Table Action Macro Fixed Action IPv 4 Table Match Table Fixed Action Macro L 2 Table Match Table Parser Reconfigurable Switch Chips v 4 ACL Control Flow Graph Switch Pipeline Queues 8
Action Macro ACL Action Macro v 6 ACL Table Match Table Action Macro v 6 Action L 2 Match Table IPv 6 Table v 4 Action. Macro Match IPv 4 Table L 2 Action. Macro Match L 2 Table Parser Mapping Control Flow to Reconfigurable Chip. v 4 ACL Control Flow Graph Switch Pipeline Queues 9
IPv 4 v 6 ACL Action Macro ACL Table My. Encap Action IPv 6 Action v 4 Action Macro Action L 2 IPv 4 My. Encap IPv 4 Table L 2 Action Macro L 2 Table Parser Reconfigurable Switch Chips v 4 ACL Control Flow Graph Switch Pipeline Queues 10
ACL Action Macro ACL Table v 6 Action Macro IPv 6 Table v 4 Action Macro IPv 4 Table L 2 Action Macro L 2 Table Parser Match Action Memory ALU Protocol Independent Switch 11
12 IPv 4 Table Action IPv 4 Table ACL Table IPv 6 v 4 Action Macro IPv 4 Table L 2 Action Macro L 2 Table Parser
Match + Action Processor: pipelined and in-parallel 13
Reconfigurability: the norm in 5 years • • Reconfigurability adds mostly to logic. Logic is getting relatively smaller. The cost of reconfigurability is going down. Fixed switch chip area today: – I/O (40%), Memory (40%), – Wires, Logic Switch I/OI/O (30%) Memory (30%) Wires (20%) Logic (20%) 14
Fixed Function Broadcom Tomahawk: 3. 2 Tbps Reconfigurable Cavium Xpliant: 3. 2 Tbps 15
Reconfigurable chips are inevitable. 16
Action Macro Fixed Action Match Table ACL Table Action Macro Fixed Action IPv 6 Table Match Table Action Macro Fixed Action IPv 4 Table Match Table Fixed Action Macro L 2 Table Match Table Parser Configuring Switch Chips P 4 code Compiler Target Queues 17
P 4 (http: //p 4. org/) Match Action Tables Parser Control Flow Graph control ingress { apply(l 2_table); if (valid(ipv 4)) { apply(ipv 4_table); } if (valid(ipv 6)) { apply(ipv 6_table); } apply (acl); } �table ipv 4_lpm { reads { ipv 4. dst. Addr : lpm; } actions { set_next_hop; drop; } } (ANCS’ 13) parser parse_ethernet { extract(ethernet); select(latest. ether. Type) { 0 x 800 : parse_ipv 4; 0 x 86 DD : parse_ipv 6; } } v 4 v 6 Action Macro Fixed Action Match Table ACL Table Action Macro Fixed Action IPv 6 Table Match Table Action Macro Fixed Action IPv 4 Table Match Table Fixed Action Macro L 2 Table Match Table Parser L 2 ACL Queues 18
What does reconfigurability buy us? 19
Benefits of Reconfigurability • Use resources efficiently – Multiple tables per stage – Big table in multiple stages • Use fewer stages L 2 IPv 4 IPv 6 ACL 20
Action Macro Action v 6 ACL Match Table v 6 Action Macro L 2 IPv 6 Table Match Table v 4 Action. Macro Match Table IPv 4 Table Action Macro L 2 Table Match Table Parser Naïve Mapping: Control Flow Graph v 4 ACL Control Flow Switch Pipeline Queues 21
Table Dependency Graph (TDG) L 2 v 4 ACL v 6 Control Flow Graph Table Dependency Graph v 4 L 2 ACL v 6 22
Efficient Mapping: TDG v 4 L 2 ACLACL v 6 Table Control Dependency Flow Graph Action ACL Table v 6 Action Macro IPv 6 Table v 4 Action Macro IPv 4 Table Action L 2 Table Parser Switch Pipeline Queues 23
L 2 Table L 2 Action v 6 ACL Table v 6 Action Macro L 2 IPv 6 v 4 Action Macro IPv 4 L 2 Action Macro L 3 Parser Resource constraints v 4 ACL Control Flow Graph Switch Pipeline Queues 24
More resource constraints Table para l lelism y r o m e M n Actio e p y T y r o m Me t u p n i U L A Action Header w idths 25
The Compiler Problem Map match action tables in a TDG to a switch pipeline while respecting dependency and resource constraints. 26
Step 1: P 4 Program Step 2: Control Flow Graph v 4 L 2 ACL v 6 Step 3: Table Dependency Graph L 2 ACL v 4 v 6 Step 4: Table Configuration 27
Is that it? 28
Two Switches We Studied 1 2 3 4 … 32 RMT 32 Stages (SIGCOMM 2013) 3 Flex. Pipe 5 Stages (Intel FM 6000) 1 2 5 4 29
Additional switch features v 4 L 2 ACL v 6 L 2 v 4 L 2 v 6 Table shaping in RMT Table sharing in Flex. Pipe 30
y r o m e M n Actio Memory Type Table para llelism The Compiler Problem Header w idths t u p n i U L A Action Map match action tables in a TDG to a switch pipeline while respecting dependency and resource constraints. Table shapi ng Table sharing 31
First approach: Greedy • Prioritize one constraint • Sort tables • Map tables one at a time 1 3 2 3 Sort by # dependencies Parser Queues 32
First approach: Greedy • Prioritize one constraint • Sort tables • Map tables one at a time 2 3 1 1 4 Sort by match width Parser Queues 33
Too many constraints for Greedy • Any greedy must sort tables based on a metric that is a fixed function of constraints. • As the number of constraints gets larger, it’s harder for a fixed function to represent the interplay between all constraints. • Can we do better than greedy? 34
Second approach: Integer Linear Programming (ILP) Find an optimal mapping. Pros: • Takes in all constraints • Different objectives • Solvers exist (CPLEX) Cons: • Blackbox solver • Encoding is an art • Slow 35
ILP Setup min # stages subject to: ≥ ≤ table sizes assigned memories assigned table sizes specified memories in physical stage dependency constraints 36
Experiment Setup • 4 datacenter use cases from Intel, Barefoot • Differ in tables, table sizes, and dependencies 37
Example Use Case Ipv 4_Ur pf IGRouter. Mac IG_Phy_ Meta IG-Props Ipv 4 Ucast. Host Ipv 4 Ucast. LPM IPv 4 Mcast IG-Smac Ipv 6_Ur pf Ipv 6 Ucast. Host IPv 6 Mcast IG_Bcast _Storm Ipv 6 Ucast. LPM Ipv 4 Ecmp IG_ACL 1 A Typical TDG IPv 4 Nexthop EG_Prop s IPv 6 Nexthop Ipv 6 Ecmp IG-Dmac EG-ACL 1 IG-Agg. Intf EG-Phy. Meta IG_ACL 2 Configuration for RMT 38
Metrics: Greedy vs ILP 1. Ability to fit program in chip 1. Optimality 2. Runtime 39
Setup: Greedy vs ILP 1. Ability to fit: Flex. Pipe – Variants of use cases in 5 -stage pipeline. 2. Optimality: RMT – Minimum stage, pipeline latency, power 3. Runtime: both switches 40
Results: Greedy vs ILP 1. Can Greedy fit my program? – Yes, if resources aplenty (RMT, 32 stages) – No, if resources constrained (Flex. Pipe, 5 stages), Can’t fit 25% of programs. 2. How close to optimal is Greedy? – 30% more time for packet to get through RMT pipeline. 3. Hmm. . looks like I need ILP. How slow is it? – 100 x slower than Greedy – Reasonable if programs don’t change often. 41
If we have time, we should run ILP. 42
Use ILP to suggest best Greedy for program type. Critical constraints • Dependency critical: 16 13 stages • Additional resource constraints less important Critical resources • TCAM memories critical: 16 14 stages – Results for one of our datacenter L 2/L 3 use cases 43
Conclusion • Challenge: Parallelism and constraints in reconfigurable chips makes compiling difficult. • TDG: highlights parallelism in program. • ILP: better if enough time, fitting is critical, or objectives are complicated. • Best Greedy: ILP can choose via notion of critical constraints and critical resources. 44
Thank you! Research funded by AT&T, Intel, Open Networking Research Center. 45
ILP Run time • Number of constraints? Not obvious. E. g. , RMT – Min. stage: few secs. – Min. power: few secs. – Min. pipeline latency 10 x slower • Number of variables? How fine-grained is the resource assignment? E. g. , Flex. Pipe – One match entry at a time: many days. . – 100 -500 match entries at a time: < 1 hr
- Slides: 46