Total Order Broadcast and Multicast Algorithms Taxonomy and

Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey (Paper by X. Défago, A. Schiper, and P. Urbán) ACM computing Surveys, Vol. 36, No 4, Dec 2004, pp. 372 -421 Aida Omerovic 4. March 2008 Seminar on Dependable and Adaptive Distributed Systems

Outline • • • Background Problem specification Classes of ordering mechanisms Failure related concepts Fault tolerance Discussion

Background Total order broadcast and multicast algorithms • Both synchronous and asynchrous system models Lack of a roadmap for use of the algorithms. Lack of generality of existing comparissons.

Notions, terms… Broadcast (messages are sent to all processes) vs. Multicast (messages are sent to a subset of processes) Closed vs. open groups (belonging of the sender) Single vs. multiple groups (disjoint/overlapping) – ensuring total order at intersection of groups Dynamic groups – Processes join and leave at runtime Partitionabe groups – Splitting of groups into subgroups through primary partition membership or partitionable membership

Motivation • Concurrency and global control in distributed systems • Total order broadcast: a group communication primitive • Ensures that messages sent to a set of processes are delivered by all those processes in the same order • Important in: clock synchronisation, active replication, distributed shared memory, distributed mutual exclusion, cooperative writing, replicated databases performance…

Main contributions • Classification w. r. t. ordering mechanisms – Characteristic with the strongest influence on the behavior – Definition of five classes of ordering mechanisms • Survey of approx. 60 published total order broadcast algorithms. • Study of properties and behaviour

A correct process def. A correct process never expresses any of the faulty behaviors: • Crash failures (stops performing any activity) • Omission failures (omits performing some actions) • Timing faulures (violation of system time). Applies only to synchronous systems. • Byzantine failures. Performs arbitrary faulty behaviour.

The problem specification The total order broadcast problem specification Two primitives: • TO-broadcast(m) For eny message, and any run: executed at most once! • TO-deliver(m) Properties of total order broadcast: 1. Validity (if a correct p TO-broadcasts m->p TO-delivers m) 2. Uniform agreement (if a p TO-delivers m->all correct p’s TO-deliver m) 3. Uniform integrity (every p TO-delivers m at most once and only if m was previously TO-broadcast by sender) 4. Uniform total order (if processes p and q both TO-deliver m and m’ then p TO-delivers m before m’ iff q TO-delivers m before m’)

The problem specification cont. Properties 1, 2 and 3 satisfied -> ”reliable broadcast”. Properties 1 and 2: ”liveness properties”. (Property may eventually hold, regardless. ) Properties 3 and 4: ”safety properties”. (Once the property does not hold, it never will). Properties 2 and 4: uniform. (Apply to both correct and faulty processes. ) Costly. Algorithms tolerant to Byzantine failures can not guarantee any of the uniform properties above. Nonuniform: Neither 2 nor 4 hold. Apply only to correct processes, no restr. on the faulty ones. Voting can be a measure.

The problem specification cont. Alternative: uniform processes are those enforced by honest processes, correct or not. (Honest process: behaves according to its specification. ) An issue: contamination. (A faulty process in an inconsistent state ”legally” TO broadcasts a message, prior to crashing, thus contaminating the correct processes. ) Note: satisfies even the strongest specification so far. This is disallowed by • ”gap-free uniform total order” (no gaps in the delivery sequence. ) • ”prefix order” (history of ane process is a prefix of the history of the other. ) However, contamination can not be avoided in case of arbitrary failures (e. g. correct delivery by faulty process. )

The problem specification cont. Other ordering properties include: • FIFO order. Delivery of messages in the order in which they are sent (not guaranteed by total order). • Causal order (m precedes m’ if sending event of m precedes the sending event of m’). Generally: broadcast of m before m’, implies delivery of m before m’ by correct processes. Note: these two properties further restrict total order property definition by properties related to SENDERS. Causal order <-> FIFO order + Local order

Classes of ordering mechanisms … according to how the ordering (e. g. timestamp, sequence number) is performed and by whom (type of role). Process roles: sender, destination, sequencer. Five classes of total order broadcast algorithms: • Fixed sequencer (sequencer) • Moving sequencer (sequencer) Token • Privilege based (sender) Token • Communication history (sender) Timestamp • Destinations agreement (destination) Timestamp Another distinction is between time-free and time-based (physical time) ordering.

Classes of ordering mechanisms cont. Neither of the five is failure tolerant!!!

Failure related conceptual issues Synchronous system: a system where upper bounds on process speed interval and communication delay, are set. Asynchronous system: the two parameters are unbounded. Timed asynchronous model: asynchronous model with notion of physical time and assumption that ”most of the messages are likely to reach their destination within a delay δ”.

Failure related conceptual issues cont. Concensus in asynchronous systems if just a single process can crash, has no deterministic solutiuon. Total order broadcast can be transformed into concensus -> the impossibility holds also here! Solution: extent the asynchronous system with oracles. An oracle provides information that processes can use to guide their choices.

Failure related conceptual issues cont. Process controlled crash: the ability to artificially force the crash of a process. Useful in crashing incorrect or suspect processes. However, a process tolerant algoriths can only tolerate the crash of a bounded number of processes. Failures: provoked + genuine => provoking failures degrades the actual fault tolerance of the system.

Fault tolerance mechanisms The main fault-tolerance mechanisms algorithms rely on: • Failure detection – Formalized by completness (prevents blocking) and accuracy (prevents algorithms from running forever without solving the problem) • Group membership service (manages membership of groups of services) – Provides consistent failure notification • Resilient communication pattern (avoids any potential blocking pattern) • Message stability (at least one process is correct…) • Concensus • Mechanisms for lossy channels (tokens, acknowledgnents…)

Conclusion • • Problem specification Five classes of total order broadcast algorithms Failure related concepts Fault tolerance mechanisms • The paper also offers a survey of approx. 60 algorithms

Discussion topics • Adaptability of the algorithms (e. g. total order multicast in dynamic, partitionable groups) • Synchrony and timeliness • Performance in the different algorithms • Fairness in the different algorithms (e. g. privilege based) • Suitability of algorithms for open vs. closed groups (e. g. processes have to know of each other in priviledge based algorithms) • Is this approach comprehensive and adequate? • Not covered yet relevant issues? • A reflection of this approach in relation to some earlier seminar topics? Can the principles be adopted elsewhere?

That’s it, folks!