Impact of Configuration Errors on DNS Robustness V
Impact of Configuration Errors on DNS Robustness V. Pappas * Z. Xu *, S. Lu *, D. Massey **, A. Terzis ***, L. Zhang * * UCLA, ** Colorado State, *** John Hopkins
Motivation • DNS: part of the Internet core infrastructure – Applications: web, e-mail, e 164, CDNs … • DNS: considered as a very reliable system – Works almost always • Question: is DNS a robust system? – User-perceived robustness – System robustness are they the same?
Motivation Short Answer: “Microsoft's websites were offline for up to 23 hours -- the most dramatic snafu to date on the Internet -because of an equipment misconfiguration” -- Wired News, Jan 2001 – Thousands or even millions of users affected – All due to a single DNS configuration error
Related Work • Traffic & implementation errors studies: – Danzig et al. [SIGCOMM 92]: bugs – CAIDA : traffic & bugs • Performance studies: – Jung et al. [IMW 01]: caching – Cohen et al. [SAINT 01]: proactive caching – Liston et al. [IMW 02]: diversity • Server availability : – To appear [OSDI 04, IMC 04]
Our Work: Study DNS Robustness • Classify DNS operational errors: – Study known errors – Identify new types of errors • Measure their pervasiveness • Quantify their impact on DNS – availability – performance
Outline • DNS Overview • Measurement Methodology • DNS Configuration Errors – Example Cases – Measurement Results • Discussion & Summary
Background net com uk foo ca jp bar. foo. com. NS ns 1. bar. foo. com. NS ns 3. bar. foo. com. NS ns 2. bar. foo. com. Zone: bar. foo. com. MX mail. bar. foo. com. Occupies a continues www. bar. foo. com. A subspace 10. 10. 10 Served by the same nameservers bar buz bar 1 bar 2 bar 3 resource records name servers
asking for www. bar. foo. com answer: www. bar. foo. com A 10. 10. 10 client caching server referral: bar NS RRs bar A RRs foo NS RRs foo A RRs referral: com NS RRs com A RRs com zone foo zone bar zone root zone
Infrastructure RRs • NS Resource Record: –Provides the names of a zone’s authoritative servers –Stored both at the parent and at the child zone com • A Resource Record –Associated with a NS resource record –Stored at the parent zone (glue A record) foo. com. NS ns 1. foo. com. ns 2. foo. com. ns 3. foo. com. A 1. 1 A 2. 2 A 3. 3 foo. com. NS ns 1. foo. com. NS ns 2. foo. com. NS ns 3. foo. com ns 1. foo. com. A 1. 1 ns 2. foo. com. A 2. 2 ns 3. foo. com. A 3. 3
What Affects DNS Availability • Name Servers: – Software failures – Network failures – Scheduled maintenance tasks • Infrastructure Resource Records: – Availability of these records – Configuration errors focus of our work
Classification of Measured Errors Inconsistency Lame Delegation Dependency Delegation Diminished Inconsistency Redundancy Cyclic Dependency The configuration of infrastructure More than one name-servers share a RRs does not correspond to the common point of failure. actual authoritative name-servers.
What is Measured? • Frequency of configuration errors: – System parameters: TLDs , DNS level, zone size (i. e. the number of delegations) • Impact on availability: – Number of servers: lost due to these errors – Zone’s availability: probability of resolving a name • Impact on performance: – Total time to resolve a query • Starting from the query issuing time • Finishing at the query final answer time
Measurement Methodology • Error frequency and availability impact: – 3 sets of active measurements • Random set of 50 K zones • 20 K zones that allow zone transfers • 500 popular zones • Performance impact: – 2 sets of passive measurements: 1 -week DNS packet traces
Lame Delegation foo. com. NS A. foo. com. NS B. foo. com A. foo. com. A 1. 1 B. foo. com. A 2. 2 1) Non-existing server -- 3 seconds perf. penalty foo 2) DNS error code -- 1 RTT perf. penalty 3) Useless referral -- 1 RTT perf. penalty A. foo. com B. foo. com 4) Non-authoritative answer (cached)
Lame Delegation Results
Lame Delegation Results 50% 0. 06 sec 0. 4 sec 3 sec
Lame Delegation Results • Error Frequency: – 15% of the zones – 8% for the 500 most popular zones – independent of the zone’s size, varies a lot per TLD • Impact: – 70% of the zones with errors lose half or more of the authoritative servers – 8% of the queries experience increased response times (up to an order of magnitude) due to lame delegation
Diminished Server Redundancy foo. com. NS A. foo. com. NS B. foo. com A. foo. com. A 1. 1 B. foo. com. A 2. 2 A) Network level: - belong to the same subnet foo B) Autonomous system level: - belong to the same AS C) Geographic location level: - belong to the same city A. foo. com B. foo. com
Diminished Server Redundancy Results • Error Frequency: – 45% of all zones have all servers in the same /24 subnet – 75% of all zones have servers in the same AS – large & popular zones: better AS and geo diversity • Impact: – less than 99. 9% availability: all servers in the same /24 subnet – more than 99. 99% availability: 3 servers at different ASs or different cities
Cyclic Zone Dependency (1) foo. com. NS A. foo. com. NS B. foo. com A. foo. com. A 1. 1 B. foo. com. A 2. 2 The A glue RR for B. foo. com missing B. foo. com depends on A. foo. com If A. foo. com is unavailable then B. foo. com is too foo A. foo. com B. foo. com
Cyclic Zone Dependency (2) bar. com. NS A. bar. com. NS B. foo. com. A. bar. com. A 2. 2 If A. foo and A. bar are unavailable, B addr. are unresolvable foo. com. NS A. foo. com. NS B. bar. com A. foo. com. A 1. 1 The B servers depend on A servers The combination of foo. com and bar. com zones is wrongly configured foo bar The foo. com zone seems correctly configured A. foo. com B. bar. com B. foo. com A. bar. com
Cyclic Zone Dependency Results • Error Frequency: – 2% of the zones – None of the 500 most popular zones • Impact: – 90% of the zones with cyclic dependency errors lose 25% (or even more) of their servers – 2 or 4 zones are involved in most errors
Discussion: User-Perceived != System Robustness • User-perceived robustness: – Data replication: only one server is needed – Data caching: temporary masks infrastructure failures – Popular zones: fewer configuration errors • System robustness: – Fewer available servers: due to inconsistency errors – Fewer redundant servers: due to dependency errors
Discussion: Why so many errors? • Superficially: are due to operators: – Unaware of these errors – Lack of coordination • parent-child zone, secondary servers hosting • Fundamentally: are due to protocol design: – Lack of mechanisms to handle these errors • proactively or reactively – Design choices that embrace some of them: • Name-servers are recognized with names • Glue NS & A records necessary to set up the DNS tree
Summary • DNS operational errors are widespread • DNS operational errors affect availability: – 50% of the servers lost – less than 99. 9% availability • DNS operational errors affect performance: – 1 or even 2 orders of magnitude • DNS system robustness lower than user perception – Due to protocol design, not just due to operator errors
Ongoing Work • Reactive mechanisms: – DNS Troubleshooting [Net. Ts 04] • Proactive mechanisms: – Enhancing DNS replication & caching
Thank You!!!
- Slides: 28