Internet Routing COS 598 A Today NonConvergence Policy

  • Slides: 30
Download presentation
Internet Routing (COS 598 A) Today: Non-Convergence: Policy Conflicts Jennifer Rexford http: //www. cs.

Internet Routing (COS 598 A) Today: Non-Convergence: Policy Conflicts Jennifer Rexford http: //www. cs. princeton. edu/~jrex/teaching/spring 2005 Tuesdays/Thursdays 11: 00 am-12: 20 pm

Outline • Stable Paths Problem – The problem BGP is solving – Abstract model

Outline • Stable Paths Problem – The problem BGP is solving – Abstract model for BGP – Translating reality into SPP • Conflicting routing policies – Examples of policy conflicts – Difficulty of identifying conflicts • Guaranteeing convergence – Guidelines based on business relationships – Provable convergence without global control • Recent work and a project idea

What Problem Does a Routing Protocol Solve? • Most do shortest-path routing – Shortest

What Problem Does a Routing Protocol Solve? • Most do shortest-path routing – Shortest hop count • Distance vector routing (e. g. , RIP) – Shortest path as sum of link weights • Link-state routing (e. g. , OSPF and IS-IS) • Policy makes BGP is more complicated – An AS might not tell a neighbor about a path • E. g. , Sprint can’t reach UUNET through AT&T – An AS might prefer one path over a shorter one • E. g. , ISP prefers to send traffic through a customer What is a good model for BGP?

Could Use A Simulation Model • Simulate the message passing – Advertisements and withdrawals

Could Use A Simulation Model • Simulate the message passing – Advertisements and withdrawals – Message format – Timers • Simulate the routing policy on each session – Filter certain route advertisements – Manipulate the attributes of others • Simulate the decision process – Each router applying all the steps per prefix Feasible, but tedious and ill-suited formal arguments

Stable Paths Problem (SPP) Instance • Node – BGP-speaking router – Node 0 is

Stable Paths Problem (SPP) Instance • Node – BGP-speaking router – Node 0 is destination 210 2 20 – BGP adjacency – Set of 1 routes to 0 at each node – Ranking of the paths 5210 2 • Edge • Permitted paths 5 0 1 4 420 430 3 30 10 most preferred … least preferred

A Solution to a Stable Paths Problem • Solution 2 210 20 – Path

A Solution to a Stable Paths Problem • Solution 2 210 20 – Path assignment per node – Can be the “null” path – {u, w} is an edge in the graph – Node w is assigned path w. P – The highest ranked path 1 consistent with the assignment of its neighbors 5210 2 • If node u has path uw. P • Each node is assigned 5 0 1 130 10 4 420 430 3 30 A solution need not represent a shortest path tree, or a spanning tree.

Translating a Real Configuration into SPP • Permitted paths at a node – Composition

Translating a Real Configuration into SPP • Permitted paths at a node – Composition of export policies at other nodes 0 Node 0 exports route to node 2 210 20 Node 2 exports 5210 “ 2 1 0” but not “ 2 0” 2 5 Node 1 exports “ 1 0” to node 2 • Ranking of paths at a node – Import policies at the node – Rank in terms of BGP decision process (i. e. , local preference, AS path length, origin type, MED, …)

An SPP May Have Multiple Solutions 120 10 1 0 0 0 2 210

An SPP May Have Multiple Solutions 120 10 1 0 0 0 2 210 20 1 2 210 20 First solution 2 210 20 Second solution

An SPP May Have No Solution 2 210 20 4 0 130 10 1

An SPP May Have No Solution 2 210 20 4 0 130 10 1 3 3 320 30

Stable System Unstable After Failure 210 20 BGP is not robust : it is

Stable System Unstable After Failure 210 20 BGP is not robust : it is not guaranteed to recover from network failures. 1 130 10 2 Becomes a BAD GADGET if link (4, 0) goes down. 4 40 420 430 0 3 3420 30

Strawman Solution Doesn’t Work • Create a global Internet routing registry – Store the

Strawman Solution Doesn’t Work • Create a global Internet routing registry – Store the AS-level graph and all routing policies – Store all routing policies – But, ASes may be unwilling to divulge • Check for conflicting policies – Analyze the global system and identify conflicts – Contact the affected ASes to resolve them – But, checking is an NP-complete problem – … and, a safe system may be unsafe after failure Goal: sufficient condition for convergence with local control

Guaranteeing Convergence

Guaranteeing Convergence

Think Globally, Act Locally • Key features of a good solution – Flexibility: allow

Think Globally, Act Locally • Key features of a good solution – Flexibility: allow diverse local policies for each AS – Privacy: do not force ASes to divulge their policies – Backwards-compatibility: no changes to BGP – Guarantees: convergence even if system changes • Restrictions based on AS relationships – Path selection rules: which route you prefer – Export policies: who you tell about your route – AS graph structure: who is connected to who

Customer-Provider Relationship • Customer pays provider for Internet access – Provider exports customer’s routes

Customer-Provider Relationship • Customer pays provider for Internet access – Provider exports customer’s routes to everybody – Customer exports only to downstream customers Traffic to the customer Traffic from the customer d provider advertisements provider traffic customer d customer

Peer-Peer Relationship • Peers exchange traffic between customers – AS exports only customer routes

Peer-Peer Relationship • Peers exchange traffic between customers – AS exports only customer routes to a peer – AS exports a peer’s routes only to its customers Traffic to/from the peer and its customers advertisements peer d traffic peer

Hierarchical AS Relationships • Provider-customer graph is directed & acyclic – If u is

Hierarchical AS Relationships • Provider-customer graph is directed & acyclic – If u is a customer of v and v is a customer of w – … then w is not a customer of u w v u

Local Path Selection Rules • Classify routes based on next-hop AS – Customer routes,

Local Path Selection Rules • Classify routes based on next-hop AS – Customer routes, peer routes, and provider routes • Rank routes based on classification – Prefer customer routes over peer/provider routes • Allow any ranking of routes within a class – E. g. , rank one customer route higher than another – Gives network operators the flexibility they need • Consistent with traffic engineering practices – Customers pay for service, and providers are paid – Peer relationship based on balanced traffic load

Two Interpretations • System is stable because ASes act like this – High-level argument

Two Interpretations • System is stable because ASes act like this – High-level argument • Export and topology assumptions are reasonable • Path selection rule matches with financial incentives – Empirical results • BGP routes for popular destinations stable for ~10 days • Most instability from a few flapping destinations • ASes should follow rules for system stability – Encourage operators to obey these guidelines – … and provide ways to verify the configuration – Need to consider more complex relationships

Playing One Condition Off Against Another • All three conditions are important – Path

Playing One Condition Off Against Another • All three conditions are important – Path ranking, export policy, and graph structure • Allowing more flexibility in ranking routes – Allow same preference for peer and customer routes – Never choose a peer route over a shorter customer route • … at the expense of stricter AS graph assumptions – Hierarchical provider-customer relationship (as before) – No private peering with (direct or indirect) providers Peer-peer

Extension to Backup Relationships • Backups: liberal export and ranking policies – The motivation

Extension to Backup Relationships • Backups: liberal export and ranking policies – The motivation is increased reliability – …but ironically it may cause routing instability! Backup Provider Peer-Peer Backup [RFC 1998] provider primary provider failure backup path backup provider peer

Backup Path Needs Global Significance 2 4 3 0 1 • Peer-backup relationship between

Backup Path Needs Global Significance 2 4 3 0 1 • Peer-backup relationship between 0 and 1 – Adds backup paths (2, 1, 0), (3, 1, 0), … • When link {2, 0} fails… – Node 2 prefers (2, 3, 1, 0) through a peer over the backup path (2, 1, 0) – Leads to the “bad gadget” example

Backup Paths: Keeping Count of Backup Edges • Solution – Prefer routes with fewest

Backup Paths: Keeping Count of Backup Edges • Solution – Prefer routes with fewest backup links – Then, break ties by preferring customer routes • Mechanism – Tag BGP route advertisement with a counter – Increment the count as you cross a backup edge No backup One backup customer One backup peer 20 2 210 2310 2410 4 3 0 1

Recent Work

Recent Work

Recent Work: Relaxing Export Rules • Goal: no restrictions on export and topology –

Recent Work: Relaxing Export Rules • Goal: no restrictions on export and topology – Allow an AS to decide whether to export – Do not require hierarchical relationships • Question – How much do you have to restrict path ranking to have a guarantee that the system is safe? • Answer – Limited to shortest-path routing • Implications – Trade-off in safety, autonomy, & expressiveness Recent work by Nick Feamster and Ramesh Johari

Recent Work: MED Oscillation (RFC 3345) • MED comparison when next-hop AS is same

Recent Work: MED Oscillation (RFC 3345) • MED comparison when next-hop AS is same • No total ordering at the leftmost router – B > A: preferring smaller router-id – C > B: preferring smaller MED attribute – A > C: preferring e. BGP-learned over i. BGP AS 1 AS 2 B: Id=1, MED=20 C: MED=10 A: Id=2 i. BGP

Project Idea: Stable Paths Problem and Root-Cause Analysis

Project Idea: Stable Paths Problem and Root-Cause Analysis

Project Idea: Root-Cause Analysis • Root-cause analysis – Identify location and cause of routing

Project Idea: Root-Cause Analysis • Root-cause analysis – Identify location and cause of routing changes – Inference from BGP protocol messages • Active area of research – Several proposed algorithms – Limited accuracy in making inferences • Research question – Is the problem just very hard? – Does the data not reveal enough information? • Project idea: study using SPP

Project Idea, Continued • Model root-cause analysis – Start with an SPP instance –

Project Idea, Continued • Model root-cause analysis – Start with an SPP instance – Fail a link (or a node) – See what path changes would occur • What events might cause these changes? 120 10 2340 20 1 340 320 3 2 0 4 40

Questions • Can you infer cause and location – If you observe routing changes

Questions • Can you infer cause and location – If you observe routing changes at all nodes – If you observe only some of the nodes • What if you make some assumptions – E. g. , policies based on business relationships • Where would you place monitors? – Best locations to place n monitors – Minimum number of monitors you need • What changes would you make to the routing protocol to make diagnosis easier?

Next Time: Hot-Potato Routing • Two papers – “Dynamics of Hot-Potato Routing in IP

Next Time: Hot-Potato Routing • Two papers – “Dynamics of Hot-Potato Routing in IP Networks” – “TIE Breaking: Tunable Interdomain Egress Selection” • NANOG video – Covering material in the first paper • In honor of spring break – No written reviews • Talk with me about your course project –. . . by Thursday March 24 – Final written report due Tuesday May 10