Efficient Network Reachability Analysis using a Succinct Control
Efficient Network Reachability Analysis using a Succinct Control Plane Representation Seyed K. Fayaz, Tushar Sharma, Ari Fogel Ratul Mahajan, Todd Millstein, Vyas Sekar George Varghese
Network configuration is hard ? ? ? Network operator Reachability policy: A can talk to B Reality R 3 network does What the R 4 R 1 A R 2 B network Does the network do what we want it to do? 2
State of the art in network verification Data plane verification DP 3 t c i f f ra DP 1 A DP 4 DP 2 Prior work: B • • ✔ Can A talk to B? Data plane (Forwarding table) DP 3 DP 1 Network operator Reachability policy: A can talk to B ✔ DP 4 R 3 R 1 A HSA, NSDI’ 12 ATPG, Co. Next’ 12 NOD, NSDI’ 15 … R 2 DP 2 Are we done? R 4 B 3
The data plane keeps changing! DP 3 DP 1 DP 3 DP 4 DP 1 DP 2 Can A talk to B? DP 4 DP 2 Can A talk to B? Time = t 1 Time = t 2 DP 3 DP 1 DP 2 Network operator Reachability policy: A can talk to B DP 34 R 3 R 1 A … Can A talk to B? time Time = t 3 traffic from A to B DP 31 DP 4 R 2 DP 32 R 4 B 4
Motivating example: Reachability bug triggered by a BGP announcement Before the incident After the incident W W A DCA services in 10. 0. 0/16 A B DCB /16 ✔ DCA New service 10. 1. 160/28 10. 0. 0/16 culprit B DCB /28 ✔ DCB /16✗ Root cause: Router B’s config. had a aggregate route 10. 0. 0/16 pointing to DCB The /28 advertisement activated the aggregate route! How can we proactively find such latent reachability bugs? 5
A data plane is just the current incarnation of the control plane! Router Prior work on control plane verification configuration Route advertisement 1 NSDI’ 05 – rcc, – Bagpipe, OOPSLA’ 16 (e. g. , BGP advertisement) Control plane – ARC, SIGCOMM’ 16 Route advertisement 2 – Batfish, NSDI’ 15 (implementation (e. g. , OSPF advertisement) – … of BGP, OSPF, etc. ) advertisements to neighbors … Route advertisement 3 (e. g. , RIPLimitations: advertisement) Data. Plane 321 Data Plane Data <prefix P, port 1> • Incomplete: Focus on just one routing protocol <prefix P, port 2> <prefix P, port 3> … …… of message passing • Unscalable: Detailed modeling … Router To find latent reachability bugs, we should focus on the control plane! 6
Our contributions • ERA: A tool for finding latent router configuration bugs in seconds based on control plane analysis – Expressive-yet-tractable control plane model – Scalable exploration of control plane model • Implementation as an open source tool 7
ERA: System overview Operator s s a P l Fai router configurations reachability policies ERA: A tool to find latent reachability bugs due to router misconfiguration. Scope: Reachability bugs occurring in the steady state 8
Outline • Background and motivation • Design of ERA • Implementation and evaluation 9
Challenges in control plane analysis Operator router configurations s Pas l Fai reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 10
Challenge 1: Expressive and tractable control plane model Operator router configurations s Pas l Fai reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 11
A route as a succinct bit-vector ? Router control plane Control plane I/ O model Actual protocol’s messages (e. g. , Batfish, NSDI’ 15) ? Expressive Tractable ✔ ✗ Protocol agnostic I/O model (e. g. , ARC, SIGCOMM’ 16) ✗ ✔ Route as a compact bit vector ✔ ✔ Protocol Dst IP Dst mask Administrative (32 bits) (5 bits) distance (4 bits) attributes (87 bits) A route as a succinct and unifying control plane I/O unit 12
Control plane as a fast pipeline of boolean operators: Intuition ? Router control plane • Why not actual router’s code? Hard to explore • Router as a fast route processing pipeline X 3 X 2 X 1 X 0 protocol attribute • An example: admin. distance (RIP=1, BGP=0) _ _ Router control X 1 X 0 X 3 X 1 X 0∨X 2 X 1 X 0 ? plane prefix router config. input _ X 1 X 0 _ RIP _ X 1 X 0 X 1= X 1 X 0 _ static 10 _ _ X 3 X 2 X 1 X 0 = X 3 X 1 X 0∨X 2 X 1 X 0 set RIP attr. to 1 _ _ _ X 3 X 1 X 0∨X 2 X 1 X 0 13
Control plane as a fast pipeline of boolean operators: Complete pipeline BDD of input routes 1 AND with supported protocols 2 6 AND with NEG. 5 7 Select best route 8 of static routes per dst prefix Apply input filters OR with aggregate routes Apply output filters 3 OR with routes originated by router 4 OR with redistributed routes BDD of output routes • Compact representation of a collection of routes using Binary Decision Diagrams (BDDs) • The pipeline captures key control plane behaviors that are source of many bugs. 14
Challenge 2: Scalable control plane exploration Operator router configurations s Pas l Fai reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 15
Reachability analysis by exploring the control plane model • Intuition: To see what traffic can reach from A to B, just find out what route prefixes advertised by B can reach A! route advertisements (represented as BDDs) traffic A True ? ? True R 3 R 4 RA Network Environment • Prepare for the worst! R 1 R 2 RB B R 5 True ? • Optimizations to scale control plane exploration: – Equivalence classes of routes – Fast AVX 2 instructions to implement conjunction/disjunction 16
Outline • Background and motivation • Design of ERA • Implementation and evaluation 17
Implementation Router config. parser from Batfish (NSDI’ 15) parsing Cisco and Juniper Control plane model (Custom Java code and BDD library) Network topology (Custom format) Operator Reachability policies (e. g. , A B, valleyfree, blackhole) Model Exploration (Java and Intel AVX 2 optimizations) Environment assumptions (default: “all routes”) s s a P Fai l https: //github. com/Network- verification/ERA 18
Evaluation • ERA is effective in finding latent reachability bugs – Found known and new bugs in synthetic scenarios – Found known and new bugs in real scenarios – These bugs were caused by router misconfiguration wrt • • • Incorrect route redistribution Incorrect route aggregation Unintended cross-protocol effects Interaction between SDN and traditional routing protocols … • ERA is fast and scalable – ERA analyzes networks with over 1, 600 routers in < 7 seconds – Finding a latent bug using state of the art data plane analysis techniques in a 2 -router network would take up to 1022 days! 19
Conclusions • Problem: How to find latent network reachability bugs? • Data plane verification is fundamentally limited • Current control plane analysis tools are incomplete or unscalable • ERA: A fast control plane analysis tool: • Modeling control plane’s I/O as compact BDDs • Modeling control plane processing logic using fast boolean arithmetic • ERA can help find latent bugs and is scalable 20
- Slides: 20