Efficient Network Reachability Analysis using a Succinct Control
Efficient Network Reachability Analysis using a Succinct Control Plane Representation Seyed K. Fayaz
Creativity • Once again, analogy: after reading Batfish and HSA: header space control space? • Asked Vyas Sekar at Seyed and Ratul that Vyas’s student Seyed would intern at u. Soft. • Seyed looks at BGP bugs and we decide to start with KLEE to explore possibilities • Seyed goes back to CMU to explore scalability and gets an OSDI paper ready • OSDI hiccups 2
What’s different • • From other data plane papers? From Batfish From rcc From cellular verification paper 3
Network configuration is hard ? ? ? Network operator Reachability policy: A can talk to B Reality R 3 network does What the R 4 R 1 A R 2 B network Does the network do what we want it to do? 4
State of the art in network verification Data plane verification DP 3 t c i f f ra DP 1 A DP 4 DP 2 Prior work: B • • ✔ Can A talk to B? Data plane (Forwarding table) DP 3 DP 1 Network operator Reachability policy: A can talk to B ✔ DP 4 R 3 R 1 A HSA, NSDI’ 12 ATPG, Co. Next’ 12 NOD, NSDI’ 15 … R 2 DP 2 Are we done? R 4 B 5
The data plane keeps changing! DP 3 DP 1 DP 3 DP 4 DP 1 DP 2 Can A talk to B? DP 4 DP 2 Can A talk to B? Time = t 1 Time = t 2 DP 3 DP 1 DP 2 Network operator Reachability policy: A can talk to B DP 34 R 3 R 1 A … Can A talk to B? time Time = t 3 traffic from A to B DP 31 DP 4 R 2 DP 32 R 4 B 6
A data plane is just the current incarnation of the control plane! Router Prior work on control plane verification configuration Route advertisement 1 NSDI’ 05 – rcc, – Bagpipe, OOPSLA’ 16 (e. g. , BGP advertisement) Control plane – ARC, SIGCOMM’ 16 Route advertisement 2 – Batfish, NSDI’ 15 (implementation (e. g. , OSPF advertisement) – … of BGP, OSPF, etc. ) advertisements to neighbors … Route advertisement 3 (e. g. , RIPLimitations: advertisement) Data. Plane 321 Data Plane Data <prefix P, port 1> • Incomplete: Focus on just one routing protocol <prefix P, port 2> <prefix P, port 3> … …… of message passing • Unscalable: Detailed modeling … Router To find latent reachability bugs, we should focus on the control plane! 7
Bug 1: Failure Triggered What is the route specification? 8
Bug 2: BGP announcement triggered Before the incident After the incident W W A DCA services in 10. 0. 0/16 A B DCB /16 ✔ DCA New service 10. 1. 160/28 10. 0. 0/16 culprit B DCB /28 ✔ DCB /16✗ Root cause: Router B’s config. had a aggregate route 10. 0. 0/16 pointing to DCB The /28 advertisement activated the aggregate route! How can we proactively find such latent reachability bugs? 9
Aggregation triggered 1 What is the route specification? 10
Route Packet Black Holes What is the route specification? 11
Violation of Waypointing What is the route specification? 12
Valley Free Property What is the route specification? 13
Violation of Isolation What is the route specification? 14
Spine • A tool for finding latent router configuration bugs in seconds based on control plane analysis – Correctness based on route reachability not dataplane reachability. – Compact encoding of routes from all protocols and the route transfer function (via BDDs) – Scalable (but incomplete, see later) exploration of the set of route announcements and flow of routes 15
Route Reachability and Data Plane • A cannot reach B unless there exists a physical path such that on every node I in the path: – a route from B reaches A – there are no data plane ACLS dropping packets from A to B on the path on which • So why not check for route reachability instead of data plane reachability across all possible routes 16
ERA: System overview Operator s s a P l Fai router configurations reachability policies ERA: A tool to find latent reachability bugs due to router misconfiguration. Scope: Reachability bugs occurring in the steady state 17
Outline • Background and motivation • Design of ERA • Implementation and evaluation 18
Challenges in control plane analysis Operator router configurations s Pas l Fai reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 19
Challenge 1: Expressive and tractable control plane model Operator router configurations s Pas l Fai reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 20
A route as a succinct bit-vector ? Router control plane Control plane I/ O model Actual protocol’s messages (e. g. , Batfish, NSDI’ 15) ? Expressive Tractable ✔ ✗ Protocol agnostic I/O model (e. g. , ARC, SIGCOMM’ 16) ✗ ✔ Route as a compact bit vector ✔ ✔ Protocol Dst IP Dst mask Administrative (32 bits) (5 bits) distance (4 bits) attributes (87 bits) A route as a succinct and unifying control plane I/O unit 21
Control plane as a fast pipeline of boolean operators: Intuition ? Router control plane • Why not actual router’s code? Hard to explore • Router as a fast route processing pipeline X 3 X 2 X 1 X 0 protocol attribute • An example: admin. distance (RIP=1, BGP=0) _ _ Router control X 1 X 0 X 3 X 1 X 0∨X 2 X 1 X 0 ? plane prefix router config. input _ X 1 X 0 _ RIP _ X 1 X 0 X 1= X 1 X 0 _ static 10 _ _ X 3 X 2 X 1 X 0 = X 3 X 1 X 0∨X 2 X 1 X 0 set RIP attr. to 1 _ _ _ X 3 X 1 X 0∨X 2 X 1 X 0 22
Control plane as a fast pipeline of boolean operators: Complete pipeline BDD of input routes 1 AND with supported protocols 2 6 AND with NEG. 5 7 Select best route 8 of static routes per dst prefix Apply input filters OR with aggregate routes Apply output filters 3 OR with routes originated by router 4 OR with redistributed routes BDD of output routes • Compact representation of a collection of routes using Binary Decision Diagrams (BDDs) • The pipeline captures key control plane behaviors that are source of many bugs. 23
Challenge 2: Scalable control plane exploration Operator router configurations s Pas l Fai reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 24
Reachability analysis by exploring the control plane model • Intuition: To see what traffic can reach from A to B, just find out what route prefixes advertised by B can reach A! route advertisements (represented as BDDs) traffic A True ? ? True R 3 R 4 RA Network Environment • Prepare for the worst! R 1 R 2 RB B R 5 True ? • Optimizations to scale control plane exploration: – Equivalence classes of routes – Fast AVX 2 instructions to implement conjunction/disjunction 25
Outline • Background and motivation • Design of ERA • Implementation and evaluation 26
Implementation Router config. parser from Batfish (NSDI’ 15) parsing Cisco and Juniper Control plane model (Custom Java code and BDD library) Network topology (Custom format) Operator Reachability policies (e. g. , A B, valleyfree, blackhole) Model Exploration (Java and Intel AVX 2 optimizations) Environment assumptions (default: “all routes”) s s a P Fai l https: //github. com/Network- verification/ERA 27
Evaluation • ERA is effective in finding latent reachability bugs – Found known and new bugs in synthetic scenarios – Found known and new bugs in real scenarios – These bugs were caused by router misconfiguration wrt • • • Incorrect route redistribution Incorrect route aggregation Unintended cross-protocol effects Interaction between SDN and traditional routing protocols … • ERA is fast and scalable – ERA analyzes networks with over 1, 600 routers in < 7 seconds – Finding a latent bug using state of the art data plane anslysis techniques in a 2 -router network would take up to 1022 days! 28
Conclusions • Problem: How to find latent network reachability bugs? • Data plane verification is fundamentally limited • Current control plane analysis tools are incomplete or unscalable • ERA: A fast control plane analysis tool: • Modeling control plane’s I/O as compact BDDs • Modeling control plane processing logic using fast boolean arithmetic • ERA can help find latent bugs and is scalable 29
- Slides: 29