Internet Routing Instability Craig Labovitz G Robert Malan
Internet Routing Instability Craig Labovitz G. Robert Malan Farnam Jahanian Appeared: SIGCOMM ‘ 97 Presenters: Supranamaya Ranjan Mohammed Ahamed
Internet Structure • Many small ISP’s at lowest level • Small number of big ISP’s at core
The Core of the Internet Sprint Verio rice. edu UUNet • Routing done using BGP at core • Inter-domain routing could be RIP/OSPF etc
BGP Overview 92. x. x 128. 42. x. x 196. 29. x. x Sprint 92. x. x Verio 100. x. x 196. 29. x. x UUNet 128. 42. x. x 196. 29. x. x 100. x. x
BGP Overview (contd. ) • Path Vector protocol • Similar to Distance Vector routing • Loop detection done using AS_PATH field R 1 R 2 Peering session (TCP) • Exchange full routing table at start • Updates sent incrementally
Key Point The volume of BGP messages exchanged is abnormally high • Most messages are redundant / unnecessary and do not correspond to and topology or policy changes
Consequence: Instability • Normal data packets handled by dedicated hardware • BGP packet processing consumes CPU time • Severe CPU processing overhead takes the router offline Route Flap Storm: B • Router A temporarily fails • When A becomes alive B & C send full routing tables • B & C fail…cascading effect A C How do we avoid /lessen the impact of these problems?
Route Dampening • Router does not accept frequent route updates to a destination • Might signal that network has erratic connectivity • Increment counter for destination when route changes • Counter exceeds threshold stop accepting updates • Decrement counter with time Problem: • Future legitimate announcements are accepted only after a delay
Prefix Aggregation/Super-netting • Core router advertises a less specific network prefix • Reduces size of routing tables exchanged Problems: Prefix aggregation is not effective because: - Internet addresses largely non-hierarchically assigned - Domain renumbering not done when changing ISP’s - 25% of prefixes multi-homed - Multi-homed prefixes should be exposed at the core
Route Servers • O(N) peering sessions per Router • 1 peering session per router Route Server In-spite of all these measures the BGP message overhead is unexpectedly high
Evaluation Methodology • Data from Route Server at M. A. E west (D. C) peering point • Peering point for more than 60 major ISP’s • Nine month log • Time series analysis of message exchange events
Observation: Lot’s of redundant updates • Duplicate route with-drawls ISP Number of With-drawls Unique A 23276 4344 5 F 86417 12435 7 I 2479023 14112 175 One Reason: - Stateless BGP - No state of previous with-drawls maintained Ratio
Observation: Instability Proportional to Activity After removing duplicate messages: ISP infrastructure up-grade Time of day 24: 00 Lesser messages 18: 00 12: 00 10: 00 AM 6: 00 Lesser messages Instability density with time
Power spectral density Number of instability events Evidence from Fine Grained Structure 7 days 24 hours Frequency (1/hour) Conjecture: BGP packets are competing with data packets during high bandwidth activity.
Proportion of announcements AADiff WADiff Proportion of routing table • ISP’s serving more network prefixes may not contribute more to instability Proportion of announcements Observation: Instability & size uncorrelated WADup Proportion of routing table
Cumulative proportion Observation: Instability distributed over routes 75% median 10 # of announcements per prefix+AS • 20% to 90% of routes change 10 times or less • No single route contributes significantly to instability
Observation: Synchronized updates • 30 s and 1 minute patterns 30 s Proportion • Inter-arrival times of updates shows periodicity AADiff 1 min • Some routers collect and send Updates once every 30 s Possible reasons: Inter Arrival Time distribution for AADiff’s • Routers get synchronized • Border router- Internal router: interaction misconfigured? ?
End-to-end Perspective Chinoy: “Dynamics of Internet routing information” (SIGCOMM 93) Measurements on NSFNET showed: - Processing and forwarding latency of BDP update is 3 orders of magnitude more than the latency incurred in forwarding data packets - Will lead to packet drops during the intervening period Paxson: “End-to-End routing behavior in the internet” (SIGCOMM 96) - Routing loops introduce loops into other router’s routing tables - An end-to-end route changes every 1. 5 hours on an average
End-to-End perspective (Paxson) Pathology type Probability in 1995 Long-lived Routing loops ~ 0. 14% same Short-lived Routing loops ~ 0. 065% same 0. 96% 2. 2% 1. 5% 3. 4% Outage>30 s Total Probability in 1996
Summary and Conclusions • Redundant routing information flows in core • Instability distributed across autonomous systems Possible reasons for instability: - Stateless BGP updates - Misconfigured routers - Synchronization - Clocks driving the links not synchronized (link “flaps”)
Follow-up work & impact “Origins of Internet Routing Instability”-1999 • Migration from stateless to stateful BGP decreased duplicate withdrawals by an order of magnitude • But Duplicate Announcements (AADup) doubled • Reason: Non-transitive attribute filtering not implemented - BGP specification: “never propagate non-transitive attributes”. . - ASPATH is transitive attribute - MED (Multi Exit Discriminator) is NOT transitive
Propagating MED’s Causes Oscillations
- Slides: 22