Validating Datacenters at Scale Karthick Jayaraman Nikolaj Bjrner
Validating Datacenters at Scale Karthick Jayaraman Nikolaj Bjørner, Jitu Padhye, Amar Agrawal, Ashish Bhargava, Paul-Andre C Bissonnette, Shane Foster, Andrew Helwer, Mark Kasten, Ivan Lee, Anup Namdhari, Haseeb Niaz, Aniruddha Parkhi, Hanukumar Pinnamraju, Adrian Power, Neha Milind Raje, Parag Sharma Microsoft Azure Networking
Hyperscale Azure Datacenter Network 54 regions worldwide network devices 140 countries servers policies maintenance changes/day
Reliablity at Hyperscale Is the network operating as expected? Will my change affect the network? 3
Reality Checker for Datacenters (RCDC) What is the Reality? What is the Intent? How to scale verification? What do we do with the results? 4
Forwarding Information Base (FIB) i 1 i 2 Determines forwarding behavior of each device Longest prefix matching dst. Ip=100. 26. 0. 1 i 3 i 4 dst. Ip=100. 25. 0. 1 Collectively determine forwarding behavior of the network Prefix Next. Hops 100. 25. 0. 0/24 { i 1, i 2 } 0. 0/0 i 4 5
Reality Checker for Datacenters (RCDC) What is the Reality? What is the Intent? How to scale verification? What do we do with the results? 6
What is the intent? ü All Pairs To. R Reachability R 1 R 2 D 1 A 1 To. R 1 R 3 D 2 A 3 A 4 To. R 2 10. 0/16 11. 0. 0. 0/16 Cluster 1 R 4 D 3 B 1 To. R 3 D 4 B 2 B 3 B 4 To. R 4 12. 0. 0. 0/16 13. 0. 0. 0/16 Cluster 2 7
What is the intent? R 1 R 2 D 1 A 1 To. R 1 R 3 D 2 A 3 A 4 To. R 2 10. 0/16 11. 0. 0. 0/16 Cluster 1 ü All Pairs To. R Reachability ü Traffic must follow shortest path R 4 D 3 B 1 To. R 3 D 4 B 2 B 3 ü Intra-cluster path length = 2 ü Intra-datacenter path length = 4 B 4 To. R 4 12. 0. 0. 0/16 13. 0. 0. 0/16 Cluster 2 8
What is the intent? R 1 R 2 D 1 A 1 To. R 1 R 3 D 2 A 3 A 4 To. R 2 10. 0/16 11. 0. 0. 0/16 Cluster 1 R 4 D 3 B 1 To. R 3 D 4 B 2 B 3 ü All Pairs To. R Reachability ü Traffic must follow shortest path ü All Equal Cost Multi Paths (ECMP) must be available B 4 To. R 4 12. 0. 0. 0/16 13. 0. 0. 0/16 Cluster 2 9
Where does the intent come from? R 1 R 2 D 1 A 1 Topology To. R 1 R 3 D 2 A 3 A 4 To. R 2 10. 0/16 11. 0. 0. 0/16 R 4 D 3 B 1 To. R 3 D 4 B 2 B 3 B 4 To. R 4 12. 0. 0. 0/16 13. 0. 0. 0/16 ü All pairs To. R reachability ü Traffic must follow shortest path ü ECMP redundancy Network Graph Service Automatic Intent Extraction 10
Reality Checker for Datacenters (RCDC) What is the Reality? What is the Intent? How to scale verification? What do we do with the results? 11
Challenges Anteater [Mai 2011] HSA [Kazemian 2012] All pairs To. R reachability analysis is O(N 3) Veriflow [Kurshid 2013] Net. Kat [Anderson 2014] No. D [Lopes 2015] Symmetries [Plotkin 2016, Beckett 2018] Composite FIB snapshot is a hard engineering problem Libra [Zeng 2014] 12
Local Validation R 1 R 2 D 1 A 1 R 3 D 2 A 2 To. R 1 10. 0/16 A 3 A 4 To. R 2 R 4 D 3 B 1 Backbone D 4 B 2 To. R 3 11. 0. 0. 0/16 12. 0. 0. 0/16 B 3 B 4 Spine router Leaf routers Exploit Azure network’s regular structure üEach router has a fixed role for a set of addresses üEnough to verify role is enforced on each router Decompose into local contracts To. R 4 13. 0. 0. 0/16 13
What are the contracts? R 1 R 2 D 1 A 1 R 3 D 2 A 2 To. R 1 10. 0/16 A 3 A 4 To. R 2 R 4 D 3 B 1 Backbone D 4 B 2 To. R 3 11. 0. 0. 0/16 12. 0. 0. 0/16 B 3 B 4 To. R 4 13. 0. 0. 0/16 Spine router Leaf routers Prefix Next. Hops 0/0 {A 1, A 2, A 3, A 4} Default contacts 11. 0. 0. 0/16 {A 1, A 2, A 3, A 4} 12. 0. 0. 0/16 {A 1, A 2, A 3, A 4} Specific contacts 13. 0. 0. 0/16 {A 1, A 2, A 3, A 4} To. R 1 Contracts 14
What are the contracts? R 1 R 2 D 1 A 1 R 3 D 2 A 2 To. R 1 10. 0/16 A 3 A 4 To. R 2 R 4 D 3 B 1 Backbone D 4 B 2 To. R 3 11. 0. 0. 0/16 12. 0. 0. 0/16 B 3 B 4 Spine router Leaf routers Prefix Next. Hops 0/0 {D 1} 10. 0/16 {To. R 1} 11. 0. 0. 0/16 {To. R 2} 12. 0. 0. 0/16 {D 1} 13. 0. 0. 0/16 {D 1} A 1 Contracts To. R 4 13. 0. 0. 0/16 15
What are the contracts? R 1 R 2 D 1 A 1 R 3 D 2 A 2 To. R 1 10. 0/16 A 3 A 4 To. R 2 R 4 D 3 B 1 Backbone D 4 B 2 To. R 3 11. 0. 0. 0/16 12. 0. 0. 0/16 B 3 B 4 Spine router Leaf routers To. R 4 13. 0. 0. 0/16 16
Live Monitoring of Forwarding Behavior R 1 R 2 D 1 A 1 R 3 D 2 A 2 To. R 1 10. 0/16 A 3 A 4 To. R 2 11. 0. 0. 0/16 R 4 D 3 B 1 D 4 B 2 To. R 3 12. 0. 0. 0/16 B 3 B 4 To. R 4 13. 0. 0. 0/16 Network Graph Service Reachability invariants Validation time for one datacenter < 3 minutes Error Reports 17
Realtime Checker for Datacenters (RCDC) What is the Reality? What is the Intent? How to scale verification? What do we do with the results? 18
Latent Error R 1 R 2 D 1 A 1 R 3 D 2 A 3 A 4 To. R 1 To. R 2 10. 0/16 11. 0. 0. 0/16 R 4 D 3 B 1 Backbone D 4 B 2 To. R 3 12. 0. 0. 0/16 B 3 B 4 Spine router Leaf routers Device Name Prefix Expected Next. Hops Actual Prefix Next. Hops To. R 1 11. 0. 0. 0/16 { A 1, A 2, A 3, A 4 } 0/0 {A 1, A 2} To. R 2 10. 0/16 { A 1, A 2, A 3, A 4 } 0/0 {A 1, A 2} To. R 4 13. 0. 0. 0/16 19
Latent Errors R 1 R 2 D 1 A 1 R 3 D 2 A 3 A 4 To. R 1 To. R 2 10. 0/16 11. 0. 0. 0/16 R 4 D 3 B 1 Backbone D 4 B 2 To. R 3 12. 0. 0. 0/16 B 3 B 4 Spine router Leaf routers Device Name Prefix Expected Next. Hops A 1 0/0 { D 1 } A 2 0/0 { D 2 } A 3 0/0 { D 3 } Actual Prefix Actual Next. Hops To. R 4 13. 0. 0. 0/16 20
O(100) What did we do about the errors? Risk Categorization • Role of device • No of additional faults required to cause an impact 0 1 2 3 4 5 6 7 Days 8 10 11 12 13 14 15 21
Experience: Types of Errors Software bugs Hardware failures Operational Drift Migrations Software bug that caused rib-fib inconsistency Operationally down links BGP Sessions that are shut Port channels not configured on T 1 s Two T 1 sets configured with the same ASN 22
Reliablity at Hyperscale Is the network operating as expected? Will my change affect the network? 23
Verifying Device Access-Control Lists (ACL) src. Ip dst. Ip protocol action * 100. 64. 0. 0/16 UDP deny * * * permit * * * deny Parsers SECGURU 24
Refactoring a Large Legacy ACL Edge ACL Refactor Several thousands lines Few hundred lines Intent was poorly understood Move out service specific protections Difficult to make changes 25
SECGURU Deploy refactored ACL Fix errors in policy Contract expects: Src. Ip = * Dst. Ip = 10. 0/16 Regression contracts Refactoring a Large Legacy ACL SECGURU Deploy refactored ACL Allow Policy only allows: Src. Ip = * Dst. Ip = 10. 0/26 Allow 26
Refactoring a Large Legacy ACL Line modifications 800 700 600 500 400 300 200 100 0 0 20 26 36 57 Added Days Deleted 72 76 82 102 124 Total 27
Summary • Captured and checked intent in Azure Datacenters • Incorporated verification to monitor drift and check impact of changes. • Optimized for hyper scale 28
More Challenges • Wide area networks • Better abstractions for intent Contact • dmaltz@microsoft. com • karjay@microsoft. com • padhye@microsoft. com • Model-based testing of device firmware • Verifying virtual network policies 29
- Slides: 29