RON Resilient Overlay Networks David Andersen Hari Balakrishnan
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science http: //nms. lcs. mit. edu/ron/
Fault-tolerant Networking B A Network C D Any-to-any communication, routing around failures
The Internet AS Mom-and-pop ISP AS Transit Big ISP AS Really-big ISP everyone’s afraid of AS AS Autonomous System (AS) Peering AS AS BGP 4 AS AS AS AS Scalability via aggressive aggregation and information hiding Commercial reality via peering & transit relationships
How Robust is Internet Routing? Paxson 95 -97 • 3. 3% of all routes had serious problems Labovitz 97 -00 • 10% of routes available < 95% of the time • 65% of routes available < 99. 9% of the time • 3 -min minimum detection+recovery time; often 15 mins • 40% of outages took 30+ mins to repair Chandra 01 • 5% of faults last more than 2. 75 hours 1. 2. 3. 4. 5. Slow outage detection and recovery Inability to detect badly performing paths Inability to efficiently leverage redundant paths Inability to perform application-specific routing Inability to express sophisticated routing policy
Our Goal To improve communication availability for small groups by at least a factor or 10 • Many applications – Collaboration and conferencing – Virtual Private Networks (VPNs) across public Internet – Overlay Internet Service
RON: Routing Using Overlays • Cooperating end-systems in different routing domains can conspire to do better than scalable wide-area protocols Reliability via path monitoring and re-routing Scalable BGP-based IP routing substrate Reliability via path monitoring and re-routing • Types of failures – Outages: Configuration/operational errors, backhoes, etc. – Performance failures: Severe congestion, denial-of-service attacks, etc.
RON Design Nodes in different routing domains (ASes) RON library Conduit Forwarder Prober Router Application-specific routing tables Policy routing module Conduit Performance Database Forwarder Prober Router Link-state routing protocol, disseminates info using RON!
Many Research Questions • Does the RON approach work at all? • Each RON is small in size, no more than 50 or 100 nodes – How fast can failure detection & recovery happen? • Policy routing – Doesn’t RON violate AUPs and other policies? • Routing behavior – Can stable routing be achieved? – Implementing efficient multi-criteria routing • Is it safe to deploy a large number of (small) interacting RONs on the Internet?
RON Deployment (19 sites) To vu. nl lulea. se ucl. uk To kaist. kr, . ve . com (ca), dsl (or), cci (ut), aros (ut), utah. edu, . com (tx) cmu (pa), dsl (nc), nyu , cornell, cable (ma), cisco (ma), mit, vu. nl, lulea. se, ucl. uk, kaist. kr, univ-in-venezuela
RON Experiments • Measure loss, latency, and throughput with and without RON • 13 hosts in the US and Europe • 3 days of measurements from data collected in March 2001 • 30 -minute average loss rates – A 30 minute outage is very serious! • Note: Experiments done with “No-Internet 2 -for -commercial-use” policy
30 -min average loss rate on Internet RON greatly improves loss-rate RON loss rate never more than 30% 13, 000 samples 30 -min average loss rate with RON
An order-of-magnitude fewer failures 30 -minute average loss rates Loss Rate 10% 20% 30% 50% 80% 100% RON Better 479 127 32 20 14 10 No Change 57 4 0 0 RON Worse 47 15 0 0 6, 825 “path hours” represented here 12 “path hours” of essentially complete outage 76 “path hours” of TCP outage RON routed around all of these! One indirection hop provides almost all the benefit!
Resilience Against Do. S Attacks
Conclusion • Improved availability of Internet communication paths using small overlays – Layered above scalable IP substrate – RON provides a set of libraries and programs to facilitate this application-specific routing • Experimental data suggest that this approach works – Over 10 X availability – Outage detection and recovery in about 15 seconds – Able to route around certain denial-of-service attacks • Many interesting questions remain… http: //nms. lcs. mit. edu/ron/
- Slides: 14