Decentralized Message Ordering for PublishSubscribe Systems Cristian Lumezanu
Decentralized Message Ordering for Publish/Subscribe Systems Cristian Lumezanu Neil Spring Bobby Bhattacharjee
Ordering? P 1 m 1 P 2 Publishers m 2 m 1< m 2 < m 1 A B C D Subscribers may observe an ambiguous order of messages Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Applications § Network Games § Subscribers = players § Messages = events in the region of the game world to which the player belongs § Common events must be seen in the same order for consistency § Messaging § Chat rooms, buddy lists § Example of messages in a chat room Alice: “Who wants to go to Sydney? ” Bob: “I do” Connor: “Who wants to go to Melbourne? ” Diane: “I am going” Bob goes to Sydney, Diane goes to Melbourne Diane goes to Sydney, Bob goes to Melbourne Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Naive solution P 1 A B P 2 C Decentralized Message Ordering for Publish/Subscribe Systems Publishers D Subscribers Middleware 2006
Naive solution P 1 P 2 Publishers le ab cal s Not ailure f f o t in o p l Centra A B C D Sequencer Subscribers Distribute the task of ordering to many sequencers Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Our solution P 1 P 2 Publishers Sequencer Network A B C D Subscribers Scalable | Practical Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Groups § GROUP: all subscribers with the same subscription § Order among messages is enforced across groups m 0, m 1, … m 0’, m 1’, … G 1 G 0 A B C D E F G RULE 1: A sequencer (ingress-only sequencer) is associated with each group and establishes order among all messages addressed to the group except for… Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Double Overlapped Groups § DOUBLE OVERLAPPED GROUPS: groups that have at least two subscribers in common § Receivers may make inconsistent decisions about message order when they belong to double overlaps m 0, m 1, … m 0’, m 1’, … G 1 G 0 A B C D E F G D: m 0 < m 0’ < m 1’ E: m 0 < m 1 < m 0’ < m 1’ RULE 2: A sequencer is associated with each double overlap Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing scheme SEQUENCING NETWORK § A sequencer is created for each double overlap between groups and for each group that has no double overlaps MESSAGE TRANSMISSION § Messages traverse the sequencing network and receive sequence numbers from all sequencers associated with the destination group MESSAGE RECEPTION § Subscribers order messages unambiguously according to the sequence numbers Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing Network: Construction G 0 G 1 G 2 G 3 = = {A, {B, B, C} C, E} B, D} E} Properties Q 0 Q 1 Q 2 1. All members of the same group see the common to G 2 to G 3 to G 1 to G 0 messages in the same order 2. All destinations can make an immediate decision of whether to deliver or buffer arriving messages Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing Network: Operation G 0 = {A, B, C} G 1 = {B, C} G 2 = {A, B, D} m 1| | Q 0 m 1| m 0| 2 |1 | m 0 1 m 1 2 m 2| | 1 Q 0 Q 1 m 0| | 2 1 m#| Q 0 | Q 1 m 0| 1 | 2 to G 1 to G 0 When a message arrives, the receiver checks the sequence numbers assigned by the relevant sequencers and decides whether to deliver or buffer the message Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing Network: Conditions Q 0 Q 1 Q 2 Conditions C 1: A single path must connect sequencers associated to each group C 2: The undirected sequencing graph must be loop free Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Loop-free sequencing network Q 0 Q 1 Q 2 m 1| | | G 0 = {A, B, D} G 1 = {A, B, C} G 2 = {B, C, D} m 0| | | m 1| 2 | | Q 0 m 0| 1 | | m 0 1 m 1 2 2 1 m 2 1 2 m#| Q 0 | Q 1 | Q 2 m 2| | |1| m 2| m 1| |2 1| | |2 1 to G 2 Q 1 m 0| 1 | 2 | to G 0 to G 1 B: m 0 < m 1 < m 2 < m 0 Decentralized Message Ordering for Publish/Subscribe Systems US O IGU B AM Middleware 2006
Loop-free sequencing network Q 0 Q 1 Q 2 m 1| | | G 0 = {A, B, D} G 1 = {A, B, C} G 2 = {B, C, D} m 0| | | m 1| 2 | | Q 0 m 0| 1 | | m 0 1 m 1 2 2 2 m 2 1 1 m#| Q 0 | Q 1 | Q 2 m 2| | |1| Q 1 m 2| | 1 m 1| 2 | | 2 to G 2 Q 2 m 0| 1 | 2 | to G 1 to G 0 B: m 2 < m 0 < m 1 Decentralized Message Ordering for Publish/Subscribe Systems IG B AM UN U UO S Middleware 2006
Results QUESTIONS What is the delay penalty incurred by the sequencing network? How many sequence numbers does each message receive? EXPERIMENT SETUP § Packet-level simulator over a 10, 000 node topology § End-hosts arranged into similar sized clusters distributed uniformly at random through the topology § Each host belongs to zero or more groups § The size of groups is generated from a Zipf distribution § Sequencers are assigned to physical nodes using a heuristic that minimizes the distance between sequencers on the same path Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Latency Stretch § ratio between the time taken for a message to traverse the sequencing network and time taken using the direct unicast path § expresses the delay penalty of an individual node when unambiguous delivery is required § worst case results since shortest unicast paths are rarely followed in publish/subscribe systems sub-linear growth How is the increase in delay distributed? Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Distribution of latency increase The highest ratios correspond to pairs in which sender and destination are very close to each other Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencers on a Path § How many sequence numbers a message must collect § Vector timestamp approaches § Sender belongs to the destination group § Append to a message information about the last message received from all the other members of the group, for each group § O(n x g) information [n nodes, g groups] § Our approach § Appends to a message information for each sequencer traversed § O(g) Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencers on a Path The number of sequencers on a path is less than half of the total number of nodes that participate Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Conclusions and Future Work CONCLUSIONS § Method for ordering messages in a publish/subscribe system § Practical and scalable § Key insight: only messages to groups with two or more common members must be ordered FUTURE WORK § Scheme for optimizing the sequencing network and the placement of sequencers on physical nodes § Dynamic behavior § Different models for group membership Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Thank You!
Backup slides
Sequencer state State maintained by a sequencer § Sequence number § Group-local sequence number § Forwarding table § Reverse-path table § Output retransmission buffer § Buffer for messages from previous sequencers Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Placing sequencers Co-locating sequencers on the same physical node 1. Place on the same physical node any sequencers whose corresponding overlaps have a subset relationship between them 2. Co-locate sequencers whose overlaps do not have a subset relationship but share at least a node Mapping co-located sequencers (sequencing node) to physical machines 1. If no sequencing node associated with a group has been mapped, map one at random 2. If there are sequencing nodes already mapped to a physical node, pick the closest unassigned sequencing node on the path associated to the group and map it to neighboring physical nodes Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Stress of a sequencing node – ratio between the number of groups for which it has to forward messages and the total number of groups Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
- Slides: 25