Distributed Synchronization outline q Introduction q Causality and
Distributed Synchronization: outline q Introduction q Causality and time o Lamport timestamps o Vector timestamps o Causal communication q Snapshots This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 1
Distributed systems q A distributed system is a collection of independent computational nodes, communicating over a network, that is abstracted as a single coherent system o Grid computing o Cloud computing (“infrastructure as a service”, “software as a service”) o Peer-to-peer computing o Sensor networks o … q A distributed operating system allows sharing of resources and coordination of distributed computation in a transparent manner (a), (b) – a distributed system (c) – a multiprocessor Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 2
Distributed synchronization q Underlies distributed operating systems and algorithms q Processes communicate via message passing (no shared memory) q Inter-node coordination in distributed systems challenged by o o o Lack of global state Lack of global clock Communication links may fail Messages may be delayed or lost Nodes may fail q Distributed synchronization supports correct coordination in distributed systems o May no longer use shared-memory based locks and semaphores Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 3
Distributed Synchronization: outline q Introduction q Causality and time o Lamport timestamps o Vector timestamps o Causal communication q Snapshots Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 4
Distributed computation model q Events o Sending a message o Receiving a message o Timeout, internal interrupt q Processors send control messages to each other o send(destination, action; parameters) q Processes may declare that they are waiting for events: o Wait for A 1, A 2, …. , An A 1(source; parameters) code to handle A 1. . An(source; parameters) code to handle An Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 5
Causality and events ordering q A distributed system has no global state nor global clock no global order on all events may be determined q Each processor knows total order on events occurring in it q There is a causal relation between the sending of a message and its receipt Lamport's happened-before relation H 1. e 1 <p e 2 e 1 <H e 2 (events within same processor are ordered) 2. e 1 <m e 2 e 1 <H e 2 (each message m is sent before it is received) 3. e 1 <H e 2 AND e 2 <H e 3 e 1 <H e 3 (transitivity) Leslie Lamport (1978): “Time, clocks, and the ordering of events in a distributed system” Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 6
Causality and events ordering (cont'd) P 1 P 3 Time P 2 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 e 1 <H e 7 ? Yes. e 5 <H e 7 ? No. e 1 <H e 3 ? Yes. e 1 <H e 8 ? Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan Yes. 7
Lamport's timestamp algorithm 1 Initially my_TS=0 2 Upon event e, 3 if e is the receipt of message m 4 my_TS=max(m. TS, my_TS) 5 my_TS++ 6 e. TS=my_TS 7 If e is the sending of message m 8 m. TS=my_TS To create a total order, ties are broken by process ID Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 8
Lamport's timestamps (cont'd) P 1 1. 1 P 2 e 1 e 2 e 3 2. 1 1. 2 2. 2 e 4 e 5 3. 2 e 6 3. 1 Time P 3 1. 3 e 7 e 8 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 4. 3 9
Distributed Synchronization: outline q Introduction q Causality and time o Lamport timestamps o Vector timestamps o Causal communication q Snapshots Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 10
Vector timestamps - motivation q Lamport's timestamps define a total order o e 1 <H e 2 e 1. TS < e 2. TS o However, e 1. TS < e 2. TS e 1 <H e 2 does not hold, in general. (concurrent events ordered arbitrarily) Definition: Message m 1 casually precedes message m 2 (written as m 1 <c m 2) if s(m 1) <H s(m 2) (sending m 1 happens before sending m 2) Definition: causality violation occurs if m 1 <c m 2 but r(m 2) <p r(m 1). In other words, m 1 is sent to processor p before m 2 but is received after it. q Lamport's timestamps do not allow to detect (hence nor prevent) causality violations. Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 11
Causality violation – an example P 1 P 2 P 3 Migrate O to P 2 Where is O? On p 2 M 1 M 3 Where is O? I don’t know ERROR! Causality violation between… M 1 and M 3. Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 12
Vector timestamps 1 Initially my_VT=[0, …, 0] 2 3 4 5 6 7 8 9 Upon event e, if e is the receipt of message m for i=1 to M my_VT[i]=max(m. VT[i], my_VT[i]) My_VT[self]++ e. VT=my_VT if e is the sending of message m m. VT=my_VT e 1. VT ≤V e 2. VT if and only if e 1. VT[i] ≤ e 2. VT[i], for every i=1, …, M e 1. VT <V e 2. VT if and only if e 1. VT ≤V e 2. VT and e 1. VT ≠ e 2. VT For vector timestamps it does hold that: e 1 <VT e 2 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan e 1 <H e 2 13
An example of a vector timestamp P 1 P 2 P 3 P 4 1 1 2 2 3 4 4 5 5 e 2 3 3 4 6 e. VT=(3, 6, 4, 2) Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 14
Comparison of vector timestamps P 1 P 2 P 3 P 4 1 1 2 2 3 4 4 e 1 2 5 5 3 e 3 3 4 6 e 2 e 1. VT=(5, 4, 1, 3) e 2. VT=(3, 6, 4, 2) Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan e 3. VT=(0, 0, 1, 3) 15
VTs can be used to detect causality violations P 1 P 2 P 3 (1, 0, 0) Migrate O to P 2 (0, 0, 1) Where is O? (2, 0, 1) On p 2 M 2 (3, 0, 1) M 1 (3, 1, 3) I don’t know M 3 (3, 2, 3) (3, 0, 2) Where is O? (3, 0, 3) ERROR! (3, 2, 4) (3, 3, 3) Causality violation between… M 1 and M 3. Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 16
Distributed Synchronization: outline q Introduction q Causality and time o Lamport timestamps o Vector timestamps o Causal communication q Snapshots Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 17
Preventing causality violations A processor cannot control the order in which it receives messages… But it may control the order in which they are delivered to applications Application Deliver in FIFO order Message arrival order FIFO ordering Assign timestamps Message passing x 1 x 2 x 4 x 5 x 3 Deliver x 1 Deliver x 2 Buffer x 4 Buffer Deliever x 3 x 4 x 5 Protocol for FIFO message delivery (as in, e. g. , TCP) Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 18
Preventing causality violations (cont'd) q Senders attach a timestamp to each message q Destination delays the delivery of out-of-order messages q Hold back a message m until we are assured that no message m’ <H m will be delivered o For every other process p, maintain the earliest timestamp of a message m that may be delivered from p o Do not deliver a message if an earlier message may still be delivered from another process q Algorithm assumes multicast communication Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 19
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 For each processor p, the earliest timestamp with which a message from p may still be delivered Upon the receipt of message m from processor p Delivery_list = {} If (blocked[p] is empty) earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 20
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p For each process p, Delivery_list = {} the messages from p that were received but If (blocked[p] is empty) were not delivered yet earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 21
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p Delivery_list = {} If (blocked[p] is empty) List of messages to be earliest[p]=m. timestamp delivered as a result of Add m to the tail of blocked[p] m’s receipt While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 22
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] If no blocked 3 4 5 6 7 8 9 10 11 12 13 14 15 messages from p, update earliest[p] Upon the receipt of message m from processor p Delivery_list = {} If (blocked[p] is empty) earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 23
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p m is now the most Delivery_list = {} recent message from p If (blocked[p] is empty) not yet delivered earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 24
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p Delivery_list = {} If there is a process k with delayed messages for which it is now safe If (blocked[p] is empty) to deliver its earliest message… earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 25
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p Remove k'th earliest message from Delivery_list = {} blocked queue, make sure it is If (blocked[p] is empty) delivered earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 26
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p Delivery_list = {} If there additional blocked If (blocked[p] is empty) messages of k, update earliest[k] to be the timestamp of the earliest such earliest[p]=m. timestamp message Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 27
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p Delivery_list = {} Otherwise the earliest message that If (blocked[p] is empty) may be delivered from k would have earliest[p]=m. timestamp previous timestamp with the k'th component incremented Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 28
Algorithm for preventing causality violations 1 earliest[1. . M] initially [ <1, 0, … 0>, <0, 1, …, 0>, …, <0, 0, …, 1> ] 2 blocked[1…M] initially [ {}, …, {} ] 3 4 5 6 7 8 9 10 11 12 13 14 15 Upon the receipt of message m from processor p Delivery_list = {} Finally, deliver set of messages that If (blocked[p] is empty) will not cause causality violation (if there any). earliest[p]=m. timestamp Add m to the tail of blocked[p] While ( k such that blocked[k] is non-empty AND i {1, …, M} (except k and Self) not_earlier(earliest[i], earliest[k]) remove the message at the head of blocked[k], put it in delivery_list if blocked[k] is non-empty earliest[k] m'. timestamp, where m' is at the head of blocked[k] else increment the k'th element of earliest[k] End While Deliver the messages in delivery_list, in causal order Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 29
Execution of the algorithm as multicast p 3 state P 1 P 2 (0, 1, 0) P 3 earliest[1] (1, 0, 0) earliest[2] earliest[3] (0, 1, 0) (0, 0, 1) m 1 Upon receipt of m 2 (1, 1, 0) m 2 (0, 1, 0) (0, 0, 1) (1, 1, 0) Upon receipt of m 1 m 2 (1, 1, 0) m 2 (0, 1, 0) m 1 (0, 0, 1) deliver m 1 (1, 1, 0) m 2 (0, 2, 0) (0, 0, 1) deliver m 2 (2, 1, 0) (0, 2, 0) (0, 0, 1) Since the algorithm is “interested” only in causal order of sending events, vector timestamp is incremented only upon send events. Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 30
Distributed Synchronization: outline q Introduction q Causality and time o Lamport timestamps o Vector timestamps o Causal communication q Snapshots Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 31
Snapshots motivation – phantom deadlock q Assume we would like to implement a distributed deadlock-detector q We record process states to check if there is a waits-for-cycle P 1 r 2 P 2 request r 1 OK p 1 request 1 2 p 2 r 2 Observed waits-for graph OK release request r 1 3 OK request p 1 4 p 2 r 2 Actual waits-for graph Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 32
What is a snapshot? q Global system state: o S=(s 1, s 2, …, s. M) – local processor states o The contents Li, j =(m 1, m 2, …, mk) of each communication channel Ci, j (channels assumed to be FIFO) These are messages sent but not yet received q Global state must be consistent o If we observe in state si that pi received message m from pk, then in observation sk , k must have sent m. o Each Li, j must contain exactly the set of messages sent by pi but not yet received by pj, as reflected by si, sj. • Snapshot state much be consistent, one that might have existed during the computation. • Observations must be mutually concurrent that is, no observation casually precedes another observation (a consistent cut) Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 33
Snapshot algorithm – informal description q Upon joining the algorithm, a process records its local state q The process that initiates the snapshot sends snapshot tokens to its neighbors (before sending any other messages) o Neighbors send them to their neighbors – broadcast q Upon receiving a snapshot token: o a process records its state prior to sending/receiving additional messages o Must then send tokens to all its other neighbors q How shall we record sets Lp, q? o q receives token from p and that is the first time q receives token: Lp, q={ } o q receives token from p but q received token before: Lp, q={all messages received by q from p since q received token} Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 34
Snapshot algorithm – data structures q The algorithm supports multiple ongoing snapshots – one per process q Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1. . M] initially [0, …, 0] integer tokens_received[1. . M] processor_state S[1…M] channel_state [1. . M] Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan The version number of my current snapshot 35
Snapshot algorithm – data structures q The algorithm supports multiple ongoing snapshots – one per process q Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1. . M] initially [0, …, 0] integer tokens_received[1. . M] processor_state S[1…M] channel_state [1. . M] Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan current_snap[r] contains version number of snapshot initiated by processor r 36
Snapshot algorithm – data structures q The algorithm supports multiple ongoing snapshots – one per process q Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1. . M] initially [0, …, 0] integer tokens_received[1. . M] processor_state S[1…M] channel_state [1. . M] Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan tokens_received[r] contains the number of tokens received for the snapshot initiated by processor r 37
Snapshot algorithm – data structures q The algorithm supports multiple ongoing snapshots – one per process q Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1. . M] initially [0, …, 0] integer tokens_received[1. . M] processor_state S[1…M] channel_state [1. . M] Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan processor_state[r] contains the state recorded for the snapshot of processor r 38
Snapshot algorithm – data structures q The algorithm supports multiple ongoing snapshots – one per process q Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1. . M] initially [0, …, 0] integer tokens_received[1. . M] processor_state S[1…M] channel_state [1. . M] Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan channel_state[r][q] records the state of channel from q to current processor for the snapshot initiated by processor r 39
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 40
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 41
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Increment version number of this snapshot, record my local state Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 42
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Send snapshot-token on all outgoing channels, initialize number of received tokens for my snapshot to 0 Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 43
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Upon receipt from q of TOKEN for snapshot (r, version) Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 44
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token If this is first token for snapshot Snapshot request: (r, version) 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 45
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: Record local state for r'th snapshot 1 my_version++, current_snap[self]=my_version. Update version number of r'th snapshot 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 46
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 Set of messages on channel from q is empty Send token on all outgoing channels TOKEN(q; r, version): Initialize number of received tokens to 1 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 47
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 Not the first token for (r, version) TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 48
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 Yet another token for snapshot (r, version) TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 49
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 These messages are the state of the channel from q for snapshot (r, version) TOKEN(q; r, version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 50
Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 If all tokens of snapshot (r, version) arrived, TOKEN(q; r, version): snapshot computation is over 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] 1 10. else 11. tokens_received[r]++ 12. L[r][q] all messages received from q since first receiving token(r, version) 13. if tokens_received = #incoming channels, local snapshot for (r, version) is finished Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 51
Sample execution of the snapshot algorithm q A token passing system q Each processor receives, in its turn, a privilege token, uses it to perform privileged operations and then passes it on q Assume processor p did not receive the token for a long duration so it invokes the snapshot algorithm to find out why p q Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 52
Sample execution of the snapshot algorithm 1. p requests a snapshot p t q State(p)={} 2. q sends privilege token p t q State(p)={} 3. Snapshot token arrives at q p State(p)={} 4. Snapshot token arrives at p p State(p)={} Lq, p={ } Observed state: p q t State(q)={}, Lp, q={} q Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 53
- Slides: 53