Reliable Multicast for TimeCritical Systems Mahesh Balakrishnan Ken
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University
Mission-Critical Datacenters n COTS Datacenters ¨ Online e-tailers, search engines, corporate applications ¨ Web-services n Mission-Critical Apps ¨ Need: Scalability, Availability, Fault-Tolerance … Timeliness!
The Time-Critical Datacenter n Migrating time-critical applications to commodity datacenters… n … conversely, providing datacenter webservices with time-critical performance.
What’s a Time-Critical System? n Not ‘real time’, but ‘real fast’! n Financial calculators, military command control… air traffic control (ATC) n … foobooks. com! n Technology Gap: Real-Time focuses on determinism, scale-up architectures
The French ATC System Mid to Late 90’s n Teams of 3 -5 air traffic controllers on a cluster of desktop consoles n 50 -200 of these console clusters in an air traffic control center n Why study the French ATC? n
ATC Subsystems n Radar Image Weather Alert Track Updates to Flight Plans Console to Console State Updates System Management and Monitoring ATC center to center Updates n Multicast ubiquitous… n n n
Two Kinds of Multicast n Virtually Synchronous Multicast: very reliable, not particularly fast n Unreliable Multicast: very fast, not particularly reliable n Nothing in between!
Two Kinds of Subsystems n Category 1: Complete reliability (virtual synchrony) e. g: Routing decisions n Category 2: Careful application design + natural hardware properties + management policies. e. g: Radar
Multicast in the French ATC n Engineering Lessons: ¨ Structure application to tolerate partial failures ¨ Exploit natural hardware properties n Can we generalize to modern systems? n Research Direction: Time-Critical Reliability ¨ Can we design communication primitives that encapsulate these lessons?
Anatomy of a Cloned Service
Services n An Amazon web-page is constructed by 100 s of co-operating services* n Multicast is used for: ¨ Updating Cloned Services ¨ Publish-Subscribe / Eventing ¨ Datacenter Management/Monitoring * Werner Vogels, CTO of amazon. com, at SOSP 2005
Multicast in the Datacenter n A node is in many multicast groups: ¨ One for each service it hosts ¨ One for each topic it subscribes to ¨ One or more administration groups Large Numbers of Overlapping Groups!
Service Semantics Data Store Services: stale data can result in overselling / underselling loss of realworld dollars Cache Services: updated periodically by back-end data-stores
The Challenge n Datacenter Blades are failure-prone: ¨ Crash failures ¨ Byzantine behavior ¨ Bursty Packet Loss : End-hosts kernels drop packets when subjected to traffic spikes.
A New Reliability Model Rapid delivery is more important than perfect reliability n Probabilistic Timeliness n Graceful Degradation n
Wanted: a multicast primitive that 1. 2. 3. 4. 5. Scales to large numbers of arbitrarily overlapping multicast groups Delivers multicasts quickly Tolerates datacenter failure modes – bursty packet loss, node failures Offers probabilistic properties ‘Gives up’ on lost data after a threshold period
Ricochet: Lateral Error Correction Receivers exchange error correction XORs of multicast traffic n Works very well with multiple groups – scales upto a thousand groups per node n Probabilistic Timeliness: probability distribution of delivery latencies n
Predictive Total Ordering (Plato) Delivers messages to applications with no ordering delay in most cases n Orders messages only if there is a high probability of out-of-order delivery across different nodes n Probabilistic Timeliness: probability distribution of ordered delivery latency n
Performance n SRM takes seconds to recover lost packets n Ricochet recovers almost all packets within ~70 milliseconds
Conclusion n Move from R/T to T/C yields huge benefits! Ricochet is faster… slashes latency… scalable… ¨ Clean delivery delay curve a powerful design tool, replaced traditional hard (but conservative) limits ¨ n We’re open for business: Software and detailed paper available for download ¨ Give it a try… tell us what you think! ¨ www. cs. cornell. edu/projects/quicksilver/ricochet. html
- Slides: 20