Protecting the BGP Routes to Top Level DNS
Protecting the BGP Routes to Top Level DNS Servers USC/ISI Xiaoliang Zhao Dan Massey Allison Mankin AT&T Randy Bush UCLA Lan Wang Dan Pei Lixia Zhang NANOG-25, June 11, 2002 6/11/2002 NANOG 25 - Toronto UC Davis Felix Wu
Routing To Top Level Servers n Root Critical points of failure near root. – 13 root and 13 g. TLD servers net – 25 BGP routes (2 share a prefix) n org com Fault/attack near root could have disproportionately large impact. cairn nanog ietf bogus – 13 bogus routes to deny service. – 1 bogus route to provide bogus DNS. n isi ops Scale helps contain risk in lower tree. – Many millions of DNS servers. 6/11/2002 NANOG 25 - Toronto 2
Example DNS Routing Problem n Invalid BGP routes exist in everyone’s table. – These can include routes to root/g. TLD servers – One example observed on 4/16/01: originates route to 192. 26. 92/24 ISPs announce new path 3 lasted 20 minutes 1 lasted 3 hours Internet rrc 00 monitor 6/11/2002 c. gtld-servers. net 192. 26. 92. 30 NANOG 25 - Toronto 3
A Simple Filter n Current BGP provides dynamic routes – Explore the opposite extreme. . . n Select a single static route to each server. – Apply AS path filters to block all other announcements. • Also filter against more specifics. n Route changes on a frequency of months, if at all. – Change in IP address, origin AS, or transit policy. – Adjust route only after off-line verification 6/11/2002 NANOG 25 - Toronto 4
Why This Works: Theory n Scale is limited to a small number of routes. – No exponential growth in top level DNS servers. n Loss of a server is tolerable, invalid server is not. – Resolvers detect and time-out unreachable servers. • Provided surviving servers handle load, cost is some delay. n Expect predictable properties and stable routes. – Servers don’t change without non-trivial effort. – Servers located in highly available locations. 6/11/2002 NANOG 25 - Toronto 5
Why This Works: Data n Analysis based on BGP updates from RIPE. – Archive of BGP updates sent by each peer. – 9 ISPs from US, Europe, and Japan. – February 2001 - April 2002 n Some data collection notes – Used only peers that exchange full routing tables • Otherwise some route changes are hidden by policies – Adjusted data to discount multi-hop effect. • Multi-hop peering session resets don’t reflect ISP ops. 6/11/2002 NANOG 25 - Toronto 6
Simple Filter - Impact on Reachability ISP 1 (US/Tier 1) 6/11/2002 NANOG 25 - Toronto 7
How Static Are The Routes? n 3 changes in route to “A” over 14 months. n 2 (valid) changes in the origin AS – 5/19/01 origin AS changed from 6245 to 11840 – 6/4/01 origin AS changed from 11840 to 19836 n 1 change in transit AS routing policy – 11/8/01 (*, 10913, *) -> (*, 10913, *) – Could have built filter to allow this. . . 6/11/2002 NANOG 25 - Toronto 8
What Routes Are Lost? n Results from 3/1/01 until 5/19/01 AS change. – Reduced reachability to “A” from 99. 997% to 99. 904% n 18 events when trusted route was withdrawn – 2 resulted in no route available (28 secs, 103 secs) – 8 instances of a back-up route lasting over 3 minutes – Longest lasting back-up advertised for 15 minutes n Similar results for other time periods and servers. 6/11/2002 NANOG 25 - Toronto 9
Example of Filtered Routes 1239 10913 * No route at 16: 08: 30 n server 19836 701 With filter no route at 16: 06: 32 6/11/2002 NANOG 25 - Toronto 10
Simple Filter - Worst Case In Study ISP 3 (Europe) ISP 3 used one main route and a small number of consistent back-up routes. 6/11/2002 NANOG 25 - Toronto 11
Toward a More Balanced Approach n Required infrequent updates to the filter. – Especially useful to automate infrequent tasks. • Natural tendency to forget task or forget how to do task n More paths improves robustness – Simple filtered allowed only 1 path. – ISP 3’s reachability can be improved if filter allows two routes… n Strike a balance between allowing dynamic changes and restricting to trusted paths. 6/11/2002 NANOG 25 - Toronto 12
Our Adaptive Filter n Slow down the route dynamics and add validation. – Apply hysteresis before accepting new paths – Add options for validating new paths: • Believe route based purely on hysteresis • Probabilistic query/response testing against known data. • Trigger off-line checking (did origin AS really change? ) n Algorithm details in upcoming paper http: //fniisc. nge. isi. edu 6/11/2002 NANOG 25 - Toronto 13
Impacts on Reachability (Adaptive Filter) ISP 1 g. TLD servers Root servers 6/11/2002 NANOG 25 - Toronto 14
Impacts on Reachability (Adaptive Filter) ISP 3 g. TLD servers Root servers 6/11/2002 NANOG 25 - Toronto 15
Conclusions n Routing faults can affect top level DNS servers. – Faults were observed in the current infrastructure. – Potential large scale denial of service. n Solution is to make these routes less dynamic – Relies on unique properties of top level servers. – Lose some robustness to failure – Gain protection against invalid routes. 6/11/2002 NANOG 25 - Toronto 16
Discussion n Merit of the problem – Lots of concern over “securing” BGP and DNS – Routes to DNS servers are interesting special case n Do less dynamic routes make sense? – Only applies to this unique scenario – Our data shows trade-off is effective • very interested in access to data for counter example…. 6/11/2002 NANOG 25 - Toronto 17
You had to ask…. algorithm detail n n Path Usage Uk(p) = Tk(p)/T where T = time period, Tk(p) = time path advertised Adjust filter at end of time period T – Smooth with exponentially moving weighted average U(p) = (1 -a)*U(p) + a*Uk(p) – Allowable routes have Uk(p) > Umin or U(p) > Umin – Validate all new routes and check old routes with Pv n n Allow interim addition during T if Tk(p) > Tr Parameters used in this presentation: T=1 week, Tr=1 hour, Umin=10%, a=0. 25, Pv=0. 1 6/11/2002 NANOG 25 - Toronto 18
- Slides: 18