Consensus Ali Ghodsi UC BerkeleyKTH aligatcs berkeley edu
Consensus Ali Ghodsi – UC Berkeley/KTH alig(at)cs. berkeley. edu
Consensus n In consensus, the nodes propose values q n they all have to agree on one of these values Solving consensus is key to solving many problems in distributed computing q q q Total order broadcast (aka Atomic broadcast) Atomic commit (databases) Terminating reliable broadcast 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 2
Consensus Properties n C 1. Validity q n C 2. Agreement q n No two correct nodes decide differently C 3. Termination q n Any value decided is a value proposed Every correct node eventually decides C 4. Integrity q A node decides at most once 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 3
Sample Execution propose(0) decide(0) p 1 propose(1) decide(1) crash p 2 propose(0) decide(0) p 3 n Does it satisfy consensus? yes 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 4
Uniform Consensus Properties n C 1. Validity q n C 2’. Uniform Agreement q n No two nodes decide differently C 3. Termination q n Any value decided is a value proposed Every correct node eventually decides C 4. Integrity q No node decides twice 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 5
Sample Execution propose(0) decide(0) p 1 propose(1) decide(1) crash p 2 propose(0) decide(0) p 3 n Does it satisfy uniform consensus? no 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 6
Consensus Interface n Events Request: c, Propose | v Indication: c, Decide | v q q n Properties: q C 1, C 2, C 3, C 4 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 7
Hierarchical Consensus n Use perfect fd (P) and best-effort bcast (BEB) n Each node stores its proposal in proposal q q n Possible to adopt another proposal by changing proposal Store identity of last adopted proposer id in lastprop Loop through rounds 1 to N q In round i n node i is leader and q n other nodes q q 5/19/2021 broadcasts proposal v, and decides proposal v adopt i’s proposal v and remember lastprop i or detect crash of i Ali Ghodsi, alig(at)cs. berkeley. edu 8
Hierarchical Consensus Idea n Basic idea of hierarchical consensus q There must be a first correct leader p, n n P decides its value v and bcasts v BEB ensures all correct nodes get v q q 5/19/2021 Every correct node adopts v Future rounds will only propose v Ali Ghodsi, alig(at)cs. berkeley. edu 9
Problem with orphan messages… p 1 propose(a) decide(a) proposal: =a lastprop: =0 a a p 2 p 3 b propose(b) decide(b) proposal: =b lastprop: =0 a b a propose(c) decide(a) proposal: =c lastprop: =0 proposal: =b lastprop: =2 round 1 n proposal: =a lastprop: =1 round 2 round 3 Only adopt from node i if i>last. Prop? 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu Ghodsi, aligh(at)kth. se 10
Invariant to avoid orphans n Leader in round r might crash, q n but much later affect some node in round>r Invariant q q adopt if proposer p is ranked higher than lastprop otherwise p has crashed and should be ignored 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu Ghodsi, aligh(at)kth. se 11
Execution without failure… p 1 propose(a) decide(a) proposal: =a lastprop: =0 a p 2 p 3 a a propose(b) proposal: =b lastprop: =0 decide(a) proposal: =a lastprop: =1 a a a propose(c) proposal: =c lastprop: =0 decide(a) proposal: =a lastprop: =1 round 1 5/19/2021 proposal: =a lastprop: =2 round 2 Ali Ghodsi, alig(at)cs. berkeley. edu round 3 12
Execution with failure… p 1 propose(a) decide(a) proposal: =a lastprop: =0 a a p 2 p 3 b propose(b) decide(b) proposal: =b lastprop: =0 b b b propose(c) proposal: =c lastprop: =0 decide(b) proposal: =a lastprop: =1 round 1 n proposal: =b lastprop: =2 round 3 Uniform consensus? no 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 13
Hierarchical Consensus Impl. (1) n n Implements: Consensus (c) Uses: n n n upon event Init do q q q detected : = ; round : = 1; proposal : = ; lastprop : = 0 for i = 1 to N do n n Best. Effort. Broadcast (beb) Perfect. Failure. Detector (P) last adopted proposal and last adopted proposer id broadcast[i] : = delivered[i] : = false upon event crash | pi do q 5/19/2021 detected : = detected { rank(pi) } Ali Ghodsi, alig(at)cs. berkeley. edu 14
Hierarchical Consensus Impl. (2) n upon event c. Propose | v do q q n set node’s initial proposal, unless it has already adopted another node’s if proposal = then proposal : = v if I am leader trigger once per round trigger if I have proposal upon round = rank(self) and broadcast[round] = false and proposal ≠ do broadcast[round] : = true trigger c. Decide | proposal trigger beb. Broadcast | (DECIDED, round, proposal) q q q permanently decide q n upon event beb. Deliver | pi, (DECIDED, r, v) do q q q n if r > lastprop then proposal : = v; lastprop : = r delivered[r] : = true Invariant: only adopt “newer” than what you have Upon delivered[round] or round detected do q round : = round + 1 5/19/2021 next round if deliver or crash Ali Ghodsi, alig(at)cs. berkeley. edu 15
Correctness n Validity q n Integrity q q n Always decide own proposal or adopted value Rounds increase monotonically A node only decide once in the round it is leader Termination q Every correct node makes it to the round it is leader in n n 5/19/2021 If some leader fails, completeness of P ensures progress If leader correct, validity of BEB ensures delivery Ali Ghodsi, alig(at)cs. berkeley. edu 16
Correctness (2) n Agreement q No two correct nodes decide differently q Take correct leader with minimum id i n n By termination it will decide v It will BEB v q q q n Every correct node gets v and adopts it No older proposals can override the adoption All future proposals and decisions will be v How many failures can it tolerate? [d] q N-1 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 17
Formalism and notation important… xi = proposal for r: =1 to N do if r=p then forall j do send <val, xi, r> to j; decide xi if receive<val, x´, r> from r then xi = x´; end n Control-oriented vs event-based notation q receive<> is false iff FD detects Pr as failed 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu Ghodsi, aligh(at)kth. se 18
How about uniform consensus? 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 19
Uniform Consensus with P n Move decision to the end xi = input for r: =1 to N do if r=p then forall j do send <val, xi, r> to j; decide xi if receive<val, x´, r> from r then xi = x´; end decide xi 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 20
Possible with weaker FD than P? 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 21
Same algorithm, just use S! n Recall, Strong Detector (S) q Strong Completeness n q Weak Accuracy n q Eventually every failure is detected There exists a correct node which is never suspected by any other node Roughly, like P, but accuracy w. r. t. one node 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 22
Correctness n Validity q n Integrity q q n Always decide own proposal or adopted value Rounds increase monotonically A node only decides once in the end Termination q Every correct node makes it to the last round n n 5/19/2021 If some leader fails, completeness of S ensures progress If leader correct, validity of BEB ensures delivery Ali Ghodsi, alig(at)cs. berkeley. edu Ghodsi, aligh(at)kth. se 23
Correctness (2) n Uniform Agreement q No two nodes decide differently q Take an “accurate” correct leader with id i n n By weak accuracy (S) & termination such a node exists It will BEB v q q q Every correct node gets v and sets xi=v xi is v in subsequent rounds, final decision is v by all NB: the control-oriented code ensures proposals are adopted in monotonically increasing order! 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu Ghodsi, aligh(at)kth. se 24
Tolerance of Eventuality (1/3) n n Eventually perfect detector, cannot solve consensus with resilience t ≥ n/2 Proof by contradiction (specific case): q q Assume it is possible, and assume N=10 and t=5 The P detector initially tolerates any behavior Green nodes correct n Blue nodes crashed n Detectors behave perfectly n Consensus is 0 at time t 0 n 5/19/2021 0 0 Ali Ghodsi, alig(at)cs. berkeley. edu 0 25
Tolerance of Eventuality (2/3) n n Eventually perfect detector, cannot solve consensus with resilience t ≥ n/2 Proof by contradiction: q q Assume it is possible, and assume N=10 and t=5 The P detector initially tolerates any behavior Blue nodes correct n Green nodes crashed n Detectors behave 1 perfectly n Consensus is 1 at time t 1 n 1 5/19/2021 1 Ali Ghodsi, alig(at)cs. berkeley. edu 26
Tolerance of Eventuality (3/3) n n Eventually perfect detector, cannot solve consensus with resilience t ≥ n/2 Proof by contradiction: q q n nodes Assume it is possible, and assume N=10 and t=5 The P detector initially tolerates any behavior For t 0 time, green n 0 0 0 nodes suspect blue are dead n Green nodes For t 1 time, blue suspect green are 1 1 1 dead n decide 0 1 Thereafter detectors behave 5/19/2021 n n Ali Ghodsi, alig(at)cs. berkeley. edu Blue nodes decide Thereafter detectors behave 27
Proof technique n Referred to as partitioning argument n How to formalize it? [d] q q Time doesn’t exist Reason on prefix of executions n n n 5/19/2021 Schedule only contains events of green nodes… Schedule only contains events of red nodes… Combine two schedules… Ali Ghodsi, alig(at)cs. berkeley. edu 28
Consensus possible with weaker FD? n Yes, we’ll solve it for S q q n Weaker than P We’ll show binary consensus Recall, Eventually Strong Detector ( S) q Strong Completeness n q Eventual Weak Accuracy n q Eventually every failure is detected Eventually there exists a correct node which is never suspected by any other node Roughly, like P, but accuracy w. r. t. one node 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 29
Rotating Coordinator for S n For the eventually strong detector q q n The trivial rotating coordinator will not work Why? n “Eventually” might be after the first N rounds Basic idea (rotating coordinator for S) q q Rotate forever Eventually all nodes correct w. r. t. 1 coordinator n n Everyone adopts coordinators value Problem q How do we know when to decide? 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 30
Idea for termination n Bound the number of failures q n Less than a third can fail (f < n/3) Similar to rotating coordinator for S: q q 1) 2) 3) 4) 5/19/2021 Everyone send vote to coordinator C C picks majority vote V, and broadcasts V Every node get broadcast, change vote to V Change coordinator C and goto 1) Ali Ghodsi, alig(at)cs. berkeley. edu 31
Consensus: Rotating Coordinator for S xi : = input r=0 while true do begin r: =r+1 c: =(r mod N)+1 send <value, xi, r> to pc 5/19/2021 { rotate to coordinator c } { all send value to coord } Ali Ghodsi, alig(at)cs. berkeley. edu 32
Consensus: Rotating Coordinator for S xi : = input r=0 while true do begin r: =r+1 c: =(r mod N)+1 send <value, xi, r> to pc { rotate to coordinator c } { all send value to coord } if i==c then begin msgs[0]: =0; msgs[1]: =0; for x: =1 to N-f do begin receive <value, V, R> from q msgs[V]++; end if msgs[0]>msgs[1] then v: =0 else v: =1 forall j do send <outcome, v, r> to pj end 5/19/2021 { coord only } { reset 0 and 1 counter } { receive N-f msgs } { increase relevant counter } end { choose majority value } { send v to all } Ali Ghodsi, alig(at)cs. berkeley. edu 33
Consensus: Rotating Coordinator for S xi : = input r=0 while true do begin r: =r+1 c: =(r mod N)+1 send <value, xi, r> to pc { rotate to coordinator c } { all send value to coord } if i==c then begin msgs[0]: =0; msgs[1]: =0; for x: =1 to N-f do begin receive <value, V, R> from q msgs[V]++; end if msgs[0]>msgs[1] then v: =0 else v: =1 forall j do send <outcome, v, r> to pj end if collect<outcome, v, r> from pc then begin xi : = v end { coord only } { reset 0 and 1 counter } { receive N-f msgs } { increase relevant counter } end { choose majority value } { send v to all } { collect value from coord } { adopt v } end 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 34
Termination Detection n Majority Claim q If at least N-f nodes vote V in a round r n n Every leader will see a majority for V in all future rounds > r Proof q q At most f nodes don’t vote V We have 2 n/3=n–n/3 Then n/3<(n–f)/2 (because f<n/3) Then f<(n–f)/2 (because f<n/3) n n 5/19/2021 Less than half of any n-f nodes do not vote V More than half of any n-f nodes vote V Ali Ghodsi, alig(at)cs. berkeley. edu 35
Enforcing Decision n Coordinator checks if all N-f voted same q n Broadcast that information If coordinator says all N-f voted same q Decide for that value! 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 36
Consensus: Rotating Coordinator for S xi : = input r=0 i: =1 while true do begin r: =r+1 c: =(r mod N)+1 send <value, xi, r> to pc { rotate to coordinator c } { all send value to coord } if i==c then { coord only } begin msgs[0]: =0; msgs[1]: =0; { reset 0 and 1 counter } for x: =1 to N-f do begin receive <value, V, R> from q { receive N-f msgs } msgs[V]++; { increase relevant counter } end if msgs[0]>msgs[1] then v: =0 else v: =1 end { choose majority value } if msgs[0]==0 or msgs[1]==0 then d: =1 else d: =0 end { all N-f same? } forall j do send <outcome, d, v, r> to pj { send v to all } end if collect<outcome, d, v, r> from pc then { collect value from coord } begin xi : = v { change input to v } if d and i then begin decide(v); i: =0; end { decide if d is true } end 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 37
Correctness n Termination: q Eventually some q will not be falsely detected n n n n Eventually q is coordinator Everyone sends vote to server (majority) Everyone collects q’s vote (completeness) Everyone adopts V From now all alive nodes will vote V Next time q is coordinator, d=1 Everyone decides So all alive nodes will vote the same q q Why did we have the complex majority claim? [d] To rule out situation where N-f vote 0, and f vote 1, but later everyone adopts 1 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 38
Correctness (2) n Agreement: q q n Decide V happens after majority of N-f vote V Majority claim ensures all leaders will see majority for V Only V can be proposed from then on Only V can be decided Integrity & Validity by design… 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 39
Consensus in fail-silent? n We knew Consensus impossible in Asynchronous systems q n FLP impossibility from last lecture We have now solved Consensus for q q Synchrony using P Partial synchrony using S 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 40
The End of This Lecture… 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 41
Terminating Reliable Broadcast (TRB) 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 42
Need for stronger RB n In a chat application q n clients don’t know when or if a message will be delivered But in some applications that use RB q q q Some server uses RB and clients await delivery How long should clients await delivery? TRB provides the solution 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 43
Terminating Reliable Broadcast n Intuition q TRB is reliable broadcast in which n n Sender broadcasts M Receivers await delivery M All nodes either deliver M or “abort” “Abort” indicated by special <SF> message q Sender Faulty 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 44
TRB Interface (1) n Module: q n Events q q n Name: Terminating. Reliable. Broadcast (trb) Request: trb. Broadcast | src, m n Called by all nodes. If src≠self then m=nil Indication: trb. Deliver | src, m n m may be <SF> (sender faulty) if src crashes Property: q TRB 1 -TRB 4 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 45
TRB Interface (2) n Termination: q n Validity: q n If correct src sends m, then src will deliver m Uniform agreement: q n Every correct node eventually delivers one message If any node delivers m, then every correct node eventually delivers m Integrity (no creation): q If a node delivers m, then either m=<SF> or m was broadcast by src 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 46
Consensus Based TRB n Src RB broadcast m q n Deliver <SF> if src is suspected by P Caveat q Src crash, n n n Some get m before detected crash Some detect crash before getting m (no agreement) Intuitive idea q q Src BEB broadcast m Nodes propose (consensus) whichever comes first: n n q Crash suspicion of src (<SF>) BEB delivery from src (M) Deliver consensus decision 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 47
TRB Interface (2) n Intuitive correctness q n Termination: q q n q q q Assume a correct src sends m All nodes get m (BEB validity) before suspecting src (P accuracy) All propose m All decide m (Consensus termination and validity) Uniform agreement: q n Completeness of P and validity of BEB ensure a propose Termination of consensus ensures a delivery Validity: q n If src correct, everyone gets m, and consensus decides m By agreement of consensus Integrity (no creation): q Validity of consensus and no creation of BEB ensure <SF> or m is delivered 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 48
Hardness of TRB n Can we implement TRB in asynchronous networks? [d] q q n No, Consensus is reducible to TRB i. e. Consensus≼TRB Given TRB, implement Consensus q q q Each node TRB its proposal Save delivered values in a vector Decide using a deterministic function n 5/19/2021 E. g. median, majority, or first non <SF> msg Ali Ghodsi, alig(at)cs. berkeley. edu 49
Hardness of TRB (2) n Can we implement TRB in eventually synchronous systems (with P)? [d] q q n No, P is reducible to TRB i. e. P≼TRB, since TRB≼P we have TRB≃P Given TRB, implement P q q Each node TRB heartbeats all the time If ever receive <SF> for a node, suspect it 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 50
Hardness of TRB (3) n Accuracy q TRB guarantees: n q Contrapositive n n n if src is correct, then all correct nodes will deliver m (validity and agreement) If any correct node doesn’t deliver m, src has crashed <SF> delivery implies src is dead Completeness q If source crashes, eventually <SF> will be delivered (integrity) 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 51
TRB requires synchrony! 5/19/2021 Ali Ghodsi, alig(at)cs. berkeley. edu 52
- Slides: 52