Role of Group Communication in BS Architecture or

Role of Group Communication in BS Architecture (or: Which platform are we going to use ? ) 300 Kms Alberto Bartoli University of trieste Bologna ADAPT IST-2001 -37126

Group Communication z. Group Communication (GC) z. Suite of communication and membership primitives z. Very useful for implementing replication algorithms z. In particular, in the presence of failures (host, network) z. BS will certainly use some form of replication z. BS will certainly use some form of GC ADAPT

Options z. Java. Groups z. Used in JBoss clustering extensions z. Implemented in Java (stack of layers) Reliable Broadcast z. Spread z. Used in a variety of environments z. Implemented in C (Java interface available) Uniform Broadcast z. JBora z. Used in my lab only… z. Thin Java layer on top of Spread (much more powerful) Novel idea (? ) “Primary Uniform” Broadcast (much simpler to use) ADAPT

Replication z. Action A executed upon receiving a multicast z. Execute a method on a local object z. Update the serialized state of a local bean z. Commit a transaction z. . . m m z“Frequent” (informal) requirement: If a process executes A and then crashes, Then A must be executed also by all processes that do not crash (“actions must not be lost”) ADAPT m

Reliable broadcast ( Java. Groups) Either all correct processes deliver a message or none of them does No guarantee on processes that are not “correct” !!! Failure Membership change Executing A might not be safe ! ADAPT

Uniform broadcast ( Spread) If a process delivers a message, then all correct processes deliver that message NO NO NO Executing A is always safe ! ADAPT

In practice z“Cross-the-fingers” reliability (don’t know or don’t care) z“Real” reliability z. Uniform broadcast z. Reliable broadcast + Additional measures z. Replicated Databases: if a replica commits a transaction not committed by other replicas, undo the transaction later (Bettina) z. Replicated Services: whenever a replica crashes, surviving replicas fetch from all clients the last reply they have received (Karamanolis, Magee — IEEE TSE) z. Replicated Data: wait for an explicit response from every available replica (myself, Ozalp — JPDC) (JBoss SFSB clustering) ADAPT

“Real reliability” Reliable broadcast + Additional measures: z. Each additional measure is ad-hoc z. Each additional measure is complex ( Java. Groups) (many failure patterns to consider and to cope with) Lot of work above the GC layer Uniform broadcast: ( Spread) z. No additional measures z. Systematic approach, complexity within the GC layer Very little work above the GC layer ADAPT

Uniform broadcast: Hhmmm…. If the network can partition, you need additional measures again !!! “The view is about to split; GC can’t tell whether the processes that are about to leave have received the messages that follow” All correct processes will receive this In-doubt message: can’t tell who will receive this! Again the same problem ! ADAPT

JBora Very simple reasoning for the “common case” Processes that leave the primary view deliver a prefix of the sequence of messages in the primary view Non-Primary View 1 2 3 4 5 6 7 8 9 10 11 Primary View 1 2 3 4 5 6 7 8 9 10 11 Executing A is always safe ! No need for additional measures ! ADAPT

So what ? z. Java. Groups z. Spread z. JBora ADAPT

Scenario 1 We want to rely (almost) completely on (some snapshot of) JBoss clustering My suggestion: z. Forget about uniform multicast (Spread, JBora) z. Stick with Java. Groups Preliminary WP 1 Meeting (Bologna, Trieste) z. Encapsulating Spread within Java. Groups z. Encapsulating JBora within Java. Groups …too complex, dubious advantages (see meeting slides for details) ADAPT

Scenario 2 We don’t want to run behind JBoss clustering (write our own clustering features) My suggestion: z. We are not interested in uniform multicast z. We are interested in uniform multicast Use Java. Groups Use JBora ADAPT

My opinion z. If we use Java. Groups z. I will ask to restructure WP 1 (Task 1. 3 “Support for group communication”) z. Month 18: Dear reviewer, Trieste has led Task 1. 3. We did almost nothing. z. If we decide to write our own clustering features z. I don’t see any single reason why we should eliminate uniform multicast from the beginning ADAPT

Experiments: “Throughput under stress” z. Each sender injects 1000 msg/sec (bursty) z. All details available in a separate document (4 PIII 800 MHz, Windows 2000, Ethernet 100 MB) z. Important findings about Java. Groups (configured as in JBoss clustering): z. Processes may start missing messages and this occurs silently (no failure notification whatsoever) z. You cannot start / recover multiple processes simultaneously (they do not discover each other) z. Does not seem very “reliable” (at least, when stressed) ADAPT

A few numbers. . . Total Uniform 1 sender (500 Byte) 2 senders 1 sender (5 KBytes) 2 senders Spread 640 1254 323 592 JBora 576 871 323 359 FIFO Reliable Java. Groups 150 ! 561 496 165 275 Failed ! Very preliminary. . . z. Recall: z. Spread, JBora: Message throughput Operation throughout z. Java. Groups: Message throughput < Operation throughput (N responses for each multicast) ADAPT

Appendix ADAPT

Uniform broadcast: How is it implemented ? deliver m m z. Messages within the GC Layers for one uniform broadcast z. Uniform broadcast delivered only upon the second broadcast ack z. In practice, many optimizations: z. The white messages are not separated messages, but fields of other messages required anyway z. Costly, but not as much as it seems ADAPT

JBoss Clustering (I) m z. Messages from the application layer for one operation done My belief: z. Less efficient than uniform multicast z. The application injects N one-to-one messages into the system ADAPT

JBoss Clustering (II) z. Devising all possible failure patterns and coping with them correctly is very, very complex z. Difficult to achieve full confidence in the algorithm and its implementation m z. I know from our JPDC work that coping with view changes here is VERY complex done z. Does JBoss really handle all cases correctly ? ADAPT

Transitional Views: Why cannot be avoided ? z. Suppose a network failure during the protocol z. GCLayer may end up with one side of the partition that does not know whether the other side has received the message z. Two approaches: z. GCLayer waits until the partition recovers (not feasible) z. GCLayer notifies the application of the new view after a warning ADAPT