Sequence Consensus and MultiPaxos Seif Haridi Niklas Ekstrm

  • Slides: 82
Download presentation
Sequence Consensus and Multi-Paxos Seif Haridi, Niklas Ekström KTH Royal Institute of Technology haridi(at)kth.

Sequence Consensus and Multi-Paxos Seif Haridi, Niklas Ekström KTH Royal Institute of Technology haridi(at)kth. se, neks(at)kth. se

Motivation n We wish to implement a replicated state machine (RSM) or total order

Motivation n We wish to implement a replicated state machine (RSM) or total order broadcast Processes need to agree on the sequence of commands (or messages) to execute The standard approach is to use Paxos for single-value consensus as a starting point 18 September 2021 N. Ekström and S. Haridi 2

Consensus Properties n Validity q n Uniform Agreement q n No two processes decide

Consensus Properties n Validity q n Uniform Agreement q n No two processes decide different values Integrity q n Only proposed values may be decided Each process can decide at most one value Termination q Every correct process eventually decides a value 18 September 2021 N. Ekström and S. Haridi 3

First Trial ! n Each process q q n The approach has a serious

First Trial ! n Each process q q n The approach has a serious drawback: q n Waits until all commands are known Orders the commands into a sequence Proposes its sequence Execute commands once a sequence is decided Cannot execute until all commands are known We would like to agree on a growing sequence of commands 18 September 2021 N. Ekström and S. Haridi 4

Second Trial ! n n Consensus is agreement about a single value Let us

Second Trial ! n n Consensus is agreement about a single value Let us use Paxos Organize the algorithm in rounds At round i q q q A proposer selects a command C not decided at previous rounds 1. . i-1 Propose C for slot i Wait until Decide Cd Add Cd to the set of decided commands Move to the next round 18 September 2021 N. Ekström and S. Haridi 5

Problems with Second Trial ! n At round i q q n A proposer

Problems with Second Trial ! n At round i q q n A proposer selects a command C not decided at previous rounds 1. . i-1 Propose C and wait for Decide Cd Add Cd to the set decided commands Move to the next round This algorithm is sequential! q q In order to select a command at round i proposers have to agree on the sequence of commands C 1 … Ci-1 Not easy to pipeline proposals n Same proposal C might end decided in different slots 18 September 2021 N. Ekström and S. Haridi 6

Problems with Second Trial ! n Not easy to pipeline proposals q p 1

Problems with Second Trial ! n Not easy to pipeline proposals q p 1 p 2 l 1 p 2 takes over a continues to propose prop(c 1, 1) prop(c 2, 2) prop(c, 3) decide (c 1, 1) decide (c 2, 2) prop(b, 3) prop(b 4, 4) prop(c, 5) decide (c 1, 1) decide (c 2, 2) c 1 n n p 2 does not know the p 1 proposed c at 3 c 2 decide (c, 3) decide (c, 5) c c Two proposers p 1, p 2 l 1 is a learner 2/20/2012 decide (c, 5) 7

What is the problem? n We need to agree on each command q n

What is the problem? n We need to agree on each command q n We also need to agree on the sequence of commands q n Handled well by Paxos A mismatch with the consensus specification We would like to agree on a growing sequence of commands 18 September 2021 N. Ekström and S. Haridi 8

Consensus Mismatch n Integrity property says that a process can decide at most one

Consensus Mismatch n Integrity property says that a process can decide at most one value q n But, we don’t want to change what’s been decided before q n ”Cannot change one’s mind” Just extend with more information This is allowed by Sequence Consensus q Can decide again if new decided sequence is an extension of the previously decided sequence 18 September 2021 N. Ekström and S. Haridi 9

Consensus Properties n Validity q n Uniform Agreement q n No two processes decide

Consensus Properties n Validity q n Uniform Agreement q n No two processes decide different values Integrity q n Only proposed values may be decided Each process can decide at most one value Termination q Every correct process eventually decides a value 18 September 2021 N. Ekström and S. Haridi 10

Sequence Consensus Properties n Validity q n Uniform Agreement q n If process p

Sequence Consensus Properties n Validity q n Uniform Agreement q n If process p decides u and process q decides v then one is a prefix of the other Integrity q n If process p decides v then v is a sequence of proposed commands without duplicates If process p decides u and later decides v then u is a prefix of v Termination q If command C is proposed then eventually every correct process decides a sequence containing C 18 September 2021 N. Ekström and S. Haridi 11

Sequence Consensus n Event Interface q propose(C) n q decide(CS) n n request event

Sequence Consensus n Event Interface q propose(C) n q decide(CS) n n request event where C is a command Indication event where CS is a command sequence Abortable Sequence Consensus adds q abort n Indication event 18 September 2021 N. Ekström and S. Haridi 12

Multi-Paxos

Multi-Paxos

Roadmap: From Paxos to Multi-Paxos n n Make the minimal modifications to Paxos to

Roadmap: From Paxos to Multi-Paxos n n Make the minimal modifications to Paxos to obtain correct Multi-Paxos algorithm Then add optimizations to make the algorithm efficient 18 September 2021 N. Ekström and S. Haridi 14

Initial State for Paxos n Proposer q q n Acceptor q q q n

Initial State for Paxos n Proposer q q n Acceptor q q q n nc : = 0 Proposer’s current round number vc : = Proposer’s current value np : = 0 Promise not to accept in lower rounds na : = 0 Round number in which value accepted va : = Accepted value Learner q vd : = Decided value 18 September 2021 N. Ekström and S. Haridi 15

Paxos Algorithm n Proposer On Propose(C): q q q n nc : = unique

Paxos Algorithm n Proposer On Propose(C): q q q n nc : = unique proposal number S : = , acks : = 0 Send Prepare, nc to acceptors Acceptor n q q q On Promise, n, n’, v’ s. t. n = nc: q q q Add (n’, v’) to S (multiset union) If |S|= (N+1)/2 : (_, vc) : = max(S) // adopt vc vc : = vc if vc else C Send Accept, nc, vc to acceptors On Prepare, n : q n On Accept, n, v : q q n If np n: np : = n (na, va) : = (n, v) Send Accepted, n to prop. Else: Send Nack, n to prop. On Accepted, n s. t. n = nc: q q Learner On Decide, v : q q n If np < n: np : = n Send Promise, n, na, va to prop. Else: Send Nack, n to prop. acks : = acks + 1 If acks = (N+1)/2 : Send Decide, vc to learners On Nack, n s. t. n = nc: q q Trigger Abort() nc : = 0 18 September 2021 n q q q If vd = : vd : = v Trigger Decide(vd) 16

From Paxos to Multi-Paxos n Values are sequences q n is the empty sequence

From Paxos to Multi-Paxos n Values are sequences q n is the empty sequence ( = ) We make two changes: q q After adopting value (seq) with highest proposal number, the proposer is allowed to extend the sequence with nonduplicate command(s) Learner that receives Decide, v will decide v if v is longer than previously decided sequence 18 September 2021 N. Ekström and S. Haridi 17

Multi-Paxos Algorithm n Proposer On Propose(C): q q q nc : = unique proposal

Multi-Paxos Algorithm n Proposer On Propose(C): q q q nc : = unique proposal number S : = , acks : = 0 Send Prepare, nc to acceptors Acceptor n On Prepare, n : q q n On Promise, n, n’, v’ s. t. n = nc: q q q n Add (n’, v’) to S If |S|= (N+1)/2 : (_, vc) : = max(S) // Adopt vc vc : = vc if vc else C vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors q n q q q acks : = acks + 1 If acks = (N+1)/2 : Send Decide, vc to learners On Accept, n, v : q q On Accepted, n s. t. n = nc: n If np n: np : = n (na, va) : = (n, v) Send Accepted, n to proposer Learner On Decide, v : q q 18 September 2021 If np < n: np : = n Send Promise, n, na, va to prop. If vd = : If |vd| < |v|: vd : = v Trigger Decide(vd) 18

Where to go from here n Correctness ? q q n Follow the steps

Where to go from here n Correctness ? q q n Follow the steps of Lamport Correctness in modeled after the single-value Paxos correctness proof Efficiency ? q q Every proposal takes two round-trips Proposals are not pipelined Sequences are sent back and forth Decide carries sequences 18 September 2021 N. Ekström and S. Haridi 19

Prepare phase n Accept phase Proposer On Propose(C): q q q nc : =

Prepare phase n Accept phase Proposer On Propose(C): q q q nc : = unique proposal number S : = , acks : = 0 Send Prepare, nc to acceptors Acceptor n On Prepare, n : q q n On Promise, n, n’, v’ s. t. n = nc: q q q n Add (n’, v’) to S If |S|= (N+1)/2 : (_, vc) : = max(S) // Adopt vc vc : = vc if vc else C vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors q n q q q acks : = acks + 1 If acks = (N+1)/2 : Send Decide, vc to learners On Accept, n, v : q q On Accepted, n s. t. n = nc: n If np n: np : = n (na, va) : = (n, v) Send Accepted, n to proposer Learner On Decide, v : q q 18 September 2021 If np < n: np : = n Send Promise, n, na, va to prop. If vd = : If |vd| < |v|: vd : = v Trigger Decide(vd) 20

Correctness of Multi. Paxos

Correctness of Multi. Paxos

Correctness n How do we know that algorithm is correct? n Build on proof

Correctness n How do we know that algorithm is correct? n Build on proof structure for Paxos 18 September 2021 N. Ekström and S. Haridi 22

Paxos Invariants n n P 2 c. For any v and n, if a

Paxos Invariants n n P 2 c. For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S P 2 b. If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v P 2 a. If a proposal with value v is chosen, then every higher-numbered proposal accepted by any acceptor has value v P 2. If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v 18 September 2021 N. Ekström and S. Haridi 23

Chosen Sequence v n n Let va[q, n] is the sequence accepted by acceptor

Chosen Sequence v n n Let va[q, n] is the sequence accepted by acceptor q at round n A sequence v is chosen at round n q n if there exists an quorum Q of acceptors at round n such that v is prefix va[q, n], for every acceptor q in Q A sequence v is chosen q if v is chosen at round n, for some round n 18 September 2021 N. Ekström and S. Haridi 24

Chosen Sequences n When request arrives from client: q At round 3 chosen sequences

Chosen Sequences n When request arrives from client: q At round 3 chosen sequences are n <>, <c 1, c 2>, <c 1, c 2, c 3> 7 6 e 5 4 a d 3 c 3 2 c 2 c 2 1 c 1 c 1 q 2 q 3 Slide 25

Paxos to Multi-Paxos Invariants n n P 2. If a proposal with value v

Paxos to Multi-Paxos Invariants n n P 2. If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v P 2. If a proposal with seq v is chosen, then every higher-numbered proposal that is chosen has v as a prefix 18 September 2021 N. Ekström and S. Haridi 26

Paxos to Multi-Paxos Invariants n n P 2 a. If a proposal with value

Paxos to Multi-Paxos Invariants n n P 2 a. If a proposal with value v is chosen, then every higher-numbered proposal accepted by any acceptor has value v P 2 a. If a proposal with seq v is chosen, then every higher-numbered proposal accepted by any acceptor has v as a prefix 18 September 2021 N. Ekström and S. Haridi 27

Paxos to Multi-Paxos Invariants n n P 2 b. If a proposal with value

Paxos to Multi-Paxos Invariants n n P 2 b. If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v P 2 b. If a proposal with seq v is chosen, then every higher-numbered proposal issued by any proposer has v as a prefix 18 September 2021 N. Ekström and S. Haridi 28

Multi-Paxos Invariants n n Initially, the empty sequence is chosen in round n =

Multi-Paxos Invariants n n Initially, the empty sequence is chosen in round n = 0 P 2 c. For any v and n, if a proposal with seq v and number n is issued, then there is a majority set S of acceptors such that seq v is an extension of the sequence of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S 7 6 e 5 4 a d 3 c 3 2 c 2 c 2 1 c 1 c 1 q 1 18 September 2021 q 2 Highest numbered proposal accepted before round 4 is <c 1, c 2, c 3> It is ok to issue <c 1, c 2, c 3, a> at 4, or <c 1, c 2, c 3, d, e> at 5 q 3 N. Ekström and S. Haridi 29

Multi-Paxos Invariants n Initially, the empty sequence is chosen in round n = 0

Multi-Paxos Invariants n Initially, the empty sequence is chosen in round n = 0 n P 2 c. For any v and n, if a proposal with seq v and number n is issued, then there is a set S consisting of a majority of acceptors such that seq v is an extension of the sequence of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S P 2 b. If a proposal with seq v is chosen, then every higher-numbered proposal issued by any proposer has v as a prefix P 2 a. If a proposal with seq v is chosen, then every higher-numbered proposal accepted by any acceptor has v as a prefix P 2. If a proposal with seq v is chosen, then every higher -numbered proposal that is chosen has v as a prefix n n n 18 September 2021 N. Ekström and S. Haridi 30

Correctness n n n If sequence u (resp. v) is decided then it was

Correctness n n n If sequence u (resp. v) is decided then it was chosen in some round n (resp. n’) P 2: if u is chosen in n and v is chosen in n’, and n n’, then u v (u is a prefix of v) Properties: q q q Agreement: For any two seqs decided, one is a prefix of the other, since either n n’ or n’ n Integrity: If process p decides u and then v, then |u|<|v| and u is a prefix of v (see Decide) Validity: Only sequences of proposed nonduplicate commands are decided (see Promise) 18 September 2021 N. Ekström and S. Haridi 31

Optimizations

Optimizations

Performance n At this point, algorithm is not very efficient q q n No

Performance n At this point, algorithm is not very efficient q q n No pipelining of proposals Every proposal requires two round-trips Entire sequences are sent back and forth vc, va and vd are mostly redundant We add optimizations to fix these 18 September 2021 N. Ekström and S. Haridi 33

Optimizations n n The optimizations are correctness preserving transformations Each optimization will be described

Optimizations n n The optimizations are correctness preserving transformations Each optimization will be described according to the following steps: 1. 2. 3. Optimization opportunity Proposed changes Arguments why the proposed changes maintain correctness 18 September 2021 N. Ekström and S. Haridi 34

Remove Prepare and Pipeline Accept

Remove Prepare and Pipeline Accept

Assumptions n The algorithm is optimized for the case when a single proposer runs

Assumptions n The algorithm is optimized for the case when a single proposer runs for a longer period of time (leader) q q Thus, will not be aborted for a while Must guarantee safety if aborted 18 September 2021 N. Ekström and S. Haridi 36

Solution outline n Current Multi-Paxos is inefficient: q q With multiple concurrent proposers, conflicts

Solution outline n Current Multi-Paxos is inefficient: q q With multiple concurrent proposers, conflicts and restarts are likely (higher load → more conflicts) 2 rounds of messages for each value chosen (Prepare, Accept) Solution: n Pick a leader q q Most of the time one leader acts as Proposer After first Prepare only perform Accepts until aborted 18 September 2021 N. Ekström and S. Haridi

Prepare Once, Pipeline Accept n Benefit: q q Proposer does prepare(n) before first accept(n,

Prepare Once, Pipeline Accept n Benefit: q q Proposer does prepare(n) before first accept(n, v) but after that only one round-trip to decide a an extension of value v, as long as round is not aborted Allows multiple outstanding accept requests (pipelining) n Lower propose-to-decide latency for multiple proposals 18 September 2021 N. Ekström and S. Haridi 38

Optimization 1, Opportunity n After first Prepare q n n Allow issuing and accepting

Optimization 1, Opportunity n After first Prepare q n n Allow issuing and accepting multiple proposals in round n We have now multiple (values) v’s issued in the same round n Acceptor accepts longer sequences in the same round n as long as n np (acceptor’s promise) 18 September 2021 N. Ekström and S. Haridi 39

Accepts in round n, Proposer behavior n A proposer issues multiple proposals in round

Accepts in round n, Proposer behavior n A proposer issues multiple proposals in round n q q (n, v 1), (n, v 2), . . . Proposer guarantees that v 1 v 2 . . . Doesn’t have to wait for one proposal to be chosen before the next is issued Continues until aborted 18 September 2021 N. Ekström and S. Haridi 40

Accepts in round n, Acceptor behavior n We order proposals in the following way:

Accepts in round n, Acceptor behavior n We order proposals in the following way: q n An acceptor extends its accepted sequence when it receives a new proposal q n As long as it is a higher proposal Accepted messages include accepted values q n (n, v) < (n’, v’) iff n < n’ or (n = n’ and |v|<|v’|) Since multiple outstanding accept requests On Accept, n, v : q q If np n: np : = n (na, va) : = max((na, va), (n, v)) Send Accepted, n, va to prop 18 September 2021 N. Ekström and S. Haridi 41

Sequence chosen in round n n Sequence v is chosen in round n if

Sequence chosen in round n n Sequence v is chosen in round n if acceptors in a majority set have accepted (in round n) sequences having v as a prefix v 2 and all its prefixes (including v 1) are chosen q 1: v 3 q 1, q 2 and q 3 are acceptors q 2: v 1 q 3: v 2 18 September 2021 N. Ekström and S. Haridi 42

Proposer behavior upon Accepted n n Proposer maintains in a[q] the length of longest

Proposer behavior upon Accepted n n Proposer maintains in a[q] the length of longest sequence accepted by acceptor q Sequence v is supported n q q If in a majority of acceptors q: a[q] |v| If v is supported then v is chosen If v is longer than previous sequence n v is Decided and learners notified 18 September 2021 N. Ekström and S. Haridi 43

Proposer behavior upon Accepted n Each proposer p, maintains lc : q n lc

Proposer behavior upon Accepted n Each proposer p, maintains lc : q n lc is the length of the longest sequence that p knows is chosen (initially 0) On Accepted, n, v from q, n = nc: q q a[q] : = max(a[q], |v|) If lc < |v| and v is supported: lc : = |v| Send Decide, v to learners 18 September 2021 N. Ekström and S. Haridi 44

Proposer behavior, first Accept n On Propose(C): q q q n nc : =

Proposer behavior, first Accept n On Propose(C): q q q n nc : = unique proposal number S : = , a : = [0]N, lc : = 0 Send Prepare, nc to acceptors On Promise, n, n’, v’ s. t. n = nc: q q q Add (n’, v’) to S If |S|= (N+1)/2 : (_, vc) : = max(S) vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors 18 September 2021 N. Ekström and S. Haridi Proposer is in synch with majority of acceptors 45

Proposer behavior, further Accepts n On Propose(C): q q vc : = vc if

Proposer behavior, further Accepts n On Propose(C): q q vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors 18 September 2021 N. Ekström and S. Haridi 46

Remove Prepare, pipeline Accepts n Proposer On Propose(C): q q If nc 0 then

Remove Prepare, pipeline Accepts n Proposer On Propose(C): q q If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, lc : = 0 Send Prepare, nc to acceptors Replica n On Prepare, n : q q q n On Promise, n, n’, v’ s. t. n = nc: q q q n Add (n’, v’) to S If |S|= (N+1)/2 : (_, vc) : = max(S) vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors On Accepted, n, v from q, n = nc: q q a[q] : = max(a[q], |v|) If lc < |v| and v is supported: lc : = |v| Send Decide, v to learners 18 September 2021 n On Accept, n, v : q q n If np < n: np : = n Send Promise, n, na, va to prop. If np n: np : = n (na, va) : = max((na, va), (n, v)) Send Accepted, n, va to prop. On Decide, v : q q q If |vd| < |v|: vd : = v Trigger Decide(vd) 47

Correctness Remove Prepare, Pipeline Accepts

Correctness Remove Prepare, Pipeline Accepts

Correctness n We must guarantee that: q n If proposal (n, v) is chosen,

Correctness n We must guarantee that: q n If proposal (n, v) is chosen, then for every higher proposal (n’, v’) that is chosen, v v’ We have two cases: q q n = n’: only successively longer sequences can become chosen within a round since acceptors accept growing sequences n < n’: the prepare phase guarantees that all chosen sequences in round n will be adopted in round n’, and no new sequences can be chosen in round n after that 18 September 2021 N. Ekström and S. Haridi 49

Optimization 2 Avoid sending sequences

Optimization 2 Avoid sending sequences

Avoid sequences n n So far, messages contain entire sequences But sequences can be

Avoid sequences n n So far, messages contain entire sequences But sequences can be very large q q n E. g. If RSM used to maintain a transaction log in a database, then can easily be gigabytes+ Therefore algorithm is not practical (yet) With next optimizations we will eliminate sending redundant information 18 September 2021 N. Ekström and S. Haridi 51

Assumptions so far n n A 1: Optimized for the case when a single

Assumptions so far n n A 1: Optimized for the case when a single proposer runs for a longer period of time (leader) A 2: Each process acts in all roles q q Proposer, acceptor are learners Proposers and acceptors know what is decided vd 18 September 2021 N. Ekström and S. Haridi 52

Optimization 2 a (trim Promise) n Proposer p sends a Prepare message to acceptor

Optimization 2 a (trim Promise) n Proposer p sends a Prepare message to acceptor q that responds with a Promise msg q n n n Let v = va of acceptor q, v’ = vd of proposer p Promise message contains entire sequence va But p knows that the sequence that it will eventually adopt is an extension of vd Changes: q q q Prepare message includes l = |vd| Promise message includes suffix(va, l) instead of va Proposer reconstructs the adopted sequence 18 September 2021 N. Ekström and S. Haridi p q prepare promise accepted decide 53

Trim Promise, Proposer p also Learner n On Propose(C): q q n p If

Trim Promise, Proposer p also Learner n On Propose(C): q q n p If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, lc : = 0, vc : = vd Send Prepare, nc, |vd| to acceptors q prepare promise On Promise, n, n’, v s. t. n = nc: accept q accepted q q q Add (n’, v) to S If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors 18 September 2021 N. Ekström and S. Haridi decide 54

Trim Promise, Acceptor q l is length of decided sequence p n prepare On

Trim Promise, Acceptor q l is length of decided sequence p n prepare On Prepare, n, l : q q If np < n: np : = n Send Promise, n, na, suffix(va, l) to prop. promise accepted decide 18 September 2021 N. Ekström and S. Haridi 55

Optimization 2 b trim Accepted n n At proposal p On Propose(C) or On

Optimization 2 b trim Accepted n n At proposal p On Propose(C) or On On Promise(…) : : q q n n On Accept, n, v : q q q prepare promise vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors At acceptor q p For round n: After q has accepted a proposal sent by p, it must be the case that va, q vc, p accepted decide If np n: np : = n (na, va) : = max((na, va), (n, v)) Send Accepted, n, va to prop 18 September 2021 N. Ekström and S. Haridi 56

Optimization 2 b trim Accepted n Avoid sending sequence in Accepted msg from acceptor

Optimization 2 b trim Accepted n Avoid sending sequence in Accepted msg from acceptor q to proposer p q n Let va, q = va of acceptor q, vc, p = vc of proposer p After q has accepted a proposal sent by p, it must be the case that va, q vc, p q q q Replace va, q with l = |va, q| in Accepted msg The proposer can recreate va, q from vc, p and l ”l is supported” means ”prefix(vc, p, l) is supported” 18 September 2021 N. Ekström and S. Haridi 57

Optimization 2 b trim Accepted n n At proposal p On Propose(C) or On

Optimization 2 b trim Accepted n n At proposal p On Propose(C) or On On Promise(…) : : q q n n On Accept, n, v : q q q prepare promise vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors At acceptor q p For round n: After q has accepted a proposal sent by p, it must be the case that va, q vc, p accepted decide If np n: np : = n (na, va) : = max((na, va), (n, v)) Send Accepted, n, |va| to prop 18 September 2021 N. Ekström and S. Haridi 58

Optimization 2 b trim Accepted n n At proposal p On Propose(C) or On

Optimization 2 b trim Accepted n n At proposal p On Propose(C) or On On Promise(…) : : q q p q prepare promise vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors accepted decide n On Accepted, n, l from q, n = nc: q q a[q] : = max(a[q], l) If lc < l and l prefix(vc, l) is supported: lc : = l Send Decide, prefix(vc, l) to learners’s 18 September 2021 N. Ekström and S. Haridi l is length of a prefix of vc that is accepted by q 59

Optimizations upto trim accepted n Proposer On Propose(C): q q If nc 0 then

Optimizations upto trim accepted n Proposer On Propose(C): q q If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, lc : = 0, vc : = vd Send Prepare, nc, |vd| to acceptors Replica n On Prepare, n, l : q q q n On Promise, n, n’, v’ s. t. n = nc: q q q n Add (n’, v’) to S If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors On Accepted, n, l from q, n = nc: q q a[q] : = max(a[q], l) If lc < l and l is supported: lc : = l Send Decide, prefix(vc, l) to learners 18 September 2021 n On Accept, n, v : q q n If np < n: np : = n Send Promise, n, na, suffix(va, l) to prop. If np n: np : = n (na, va) : = max((na, va), (n, v)) Send Accepted, n, |va| to prop. On Decide, v : q q q If |vd| < |v|: vd : = v Trigger Decide(vd) 60

Assumptions Trim decide n A 1: The algorithm is optimized for the case p

Assumptions Trim decide n A 1: The algorithm is optimized for the case p when a single proposer runs for a longer prepare period of time (leader) promise n A 2: Each process acts in all roles q Proposer, acceptor and learner accepted decide n A 3: FIFO Perfect Links New assumption 18 September 2021 N. Ekström and S. Haridi 61 q

The FIFO link assumption n We assume FIFO Perfect Links (FPL) q q This

The FIFO link assumption n We assume FIFO Perfect Links (FPL) q q This is important for incremental accepts No performance penalties n q Not a too strong assumption in practice n q q Out of order commands has be buffered before decision In Fail-Silent model you get FPL from PL (Perfect Link) by adding sequence numbers Zoo. Keeper makes this assumption too If we implement Perfect Links on top of TCP then FIFO is more or less already provided 18 September 2021 N. Ekström and S. Haridi 62

Optimization 2 c trim Decide n n Avoid sending entire sequence in Decide msgs

Optimization 2 c trim Decide n n Avoid sending entire sequence in Decide msgs from proposer p to acceptor q If p sends Decide with seq vc to q then p has previously sent Accept(nc, vc) to q q Msgs arrive in same order due to FIFO assumption n q However Accept, nc, vc may be blocked by np n q Accept, nc, vc at acceptors before Decide, … So Decide msg must also contain proposal number nc Value v in Decide is replaced with l = |v|, and q reconstructs v = prefix(va, l) 18 September 2021 N. Ekström and S. Haridi 63

Optimization 2 c trim Decide n n Avoid sending entire sequence in Decide msgs

Optimization 2 c trim Decide n n Avoid sending entire sequence in Decide msgs from proposer p to acceptor q (also learner) p If p sends Decide with seq vc to q then p has previously sent Accept(nc, vc) to q q Msgs arrive in same order due to FIFO assumption n q However Accept, nc, vc may be blocked by np n q Accept, nc, vc at acceptors before Decide, … So Decide msg must also contain proposal number nc Value v in Decide is replaced with l = |v|, and q reconstructs v = prefix(va, l) 18 September 2021 N. Ekström and S. Haridi q prepare promise accepted decide 64

Optimization 2 c trim Decide n On Accepted, n, l from q, n =

Optimization 2 c trim Decide n On Accepted, n, l from q, n = nc: q q n p prepare a[q] : = max(a[q], l) If lc < l and prefix(vc, l) is supported: lc : = l Send Decide, nc, lc to learners promise accept On Decide, n, l : q q q accepted If np = n and |vd| < l: vd : = prefix(va, l) Trigger Decide(vd) 18 September 2021 q Decide on the prefix(vc, lc) at round nc N. Ekström and S. Haridi decide 65

Optimization after trim decide n Proposer On Propose(C): q q If nc 0 then

Optimization after trim decide n Proposer On Propose(C): q q If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, lc : = 0 , vc : = vd Send Prepare, nc, |vd| to acceptors Replica n On Prepare, n, l : q q q n On Promise, n, n’, v’ s. t. n = nc: q q q n Add (n’, v’) to S If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors On Accepted, n, l from q, n = nc: q q a[q] : = max(a[q], l) If lc < l and l is supported: lc : = l Send Decide, nc, lc to learners 18 September 2021 n On Accept, n, v : q q n If np < n: np : = n Send Promise, n, na, suffix(va, l) to prop. If np n: np : = n (na, va) : = max((na, va), (n, v)) Send Accepted, n, |va| to prop. On Decide, n, l : q q q If np = n and |vd| < l: vd : = prefix(va, l) Trigger Decide(vd) 66

Incremental Accepts

Incremental Accepts

First Accept n On Propose(C): q q n p If nc 0 then skip

First Accept n On Propose(C): q q n p If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, lc : = 0, vc : = vd Send Prepare, nc, |vd| to acceptors q q q Add (n’, v) to S If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C Send Accept, nc, vc to acceptors 18 September 2021 N. Ekström and S. Haridi prepare promise accepted On Promise, n, n’, v s. t. n = nc: q q Avoid sending the whole sequence vc decide 68

Trim Accept, n Accept msg contains a suffix vs and an offset offs, from

Trim Accept, n Accept msg contains a suffix vs and an offset offs, from which q can recreate sequence n n p keeps track of length l for which prefix of q’s accepted sequence is identical to prefix of vc q n q truncates its va to offset, and appends suffix q va : = prefix(va, offs) ++ vs Stored in s[q] at p Distinguish between the first Accept msg in round n and subsequent Accept msgs in n 18 September 2021 N. Ekström and S. Haridi 69

First Accept p q prepare n Proposer p has to acquire knowledge from each

First Accept p q prepare n Proposer p has to acquire knowledge from each acceptor qi q q promise accept qi’s va, and qi’s vd l = p’s vd Invariant: |va|≥|vd|at qi accepted decide n On Prepare, n, l : q q q Accepted suffix that p don't know If np < n: np : = n Send Promise, n, na, suffix(va, l), |vd| to prop 18 September 2021 N. Ekström and S. Haridi 70

First Accept n On Prepare, n, l : q q q n l =

First Accept n On Prepare, n, l : q q q n l = p’s vd Accepted suffix that p don't know If np < n: np : = n Send Promise, n, na, suffix(va, l), |vd| to prop On Promise, n, n’, v’, l from q, n = nc: q q q q Add (n’, v’) to S ; s[q] : = l If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C For Each q s. t. s[q] : l : = s[q] ; s[q] : = |vc| Send Accept, nc, suffix(vc, l), l to q 18 September 2021 N. Ekström and S. Haridi p q prepare promise accepted decide Acceptor q has prefix(vc, l) Update knowledge about q 71

First Accept n n n Promise contains l = |vd| of q, and a

First Accept n n n Promise contains l = |vd| of q, and a suffix from q that the proposer p does not know Proposer selects the max suffix and adds to its vd plus any additional commands What about acceptors that the proposer did not receive a promise from? 18 September 2021 N. Ekström and S. Haridi 72

Other Acceptors n On Prepare, n, l : q q q n l =

Other Acceptors n On Prepare, n, l : q q q n l = p’s vd Accepted suffix that p don't know If np < n: np : = n Send Promise, n, na, suffix(va, l), |vd| to prop On Promise, n, n’, v’, l from q, n = nc: q q q q Add (n’, v’) to S ; s[q] : = l if |S|= (N+1)/2 : : elseif |S|> (N+1)/2 : send Accept, nc, suffix(vc, l), l to q s[q] : = |vc| If lc 0: Send Decide, nc, lc to q 18 September 2021 N. Ekström and S. Haridi p q prepare promise accepted decide Acceptor q has prefix(vc, l) A decision is already made 73

First Accept, Other acceptors q n Proposer p waits until it receives promise msg

First Accept, Other acceptors q n Proposer p waits until it receives promise msg from q before sending first Accept message to q q q n Promise synchronizes p’s knowledge of q p may not send decide msg or subsequent accept msgs to q until first Accept msg is sent to q If some seq has been chosen before p received promise from q then p must send decide msg to q after first Accept 18 September 2021 N. Ekström and S. Haridi 74

Optimization 2 d Trim Accept after first n Subsequent Accept messages: q q Let

Optimization 2 d Trim Accept after first n Subsequent Accept messages: q q Let m 1 = Accept, n, v 1 and m 2 = Accept, n, v 2 , and m 1 is sent before m 2 from p to q p knows that at the time when q processes m 2, q will have accepted v 1, or blocked round n n q Holds because of FIFO links Therefore p will send vs = suffix(v 2, |v 1|) and offs = |v 1| instead of v 2 18 September 2021 N. Ekström and S. Haridi 75

Accept after first n On Propose(C): q q n vc : = vc if

Accept after first n On Propose(C): q q n vc : = vc if C in vc else vc ++ C for Each q s. t. s[q] : l : = s[q] ; s[q] : = |vc| Send Accept, nc, suffix(vc, l), l to q On Accept, n, vs, offs : q q q If np = n: // cannot be < due to FIFO na : = n If offs < |va|: va : = prefix(va, offs) va : = va ++ vs Send Accepted, n, |va| to prop. 18 September 2021 N. Ekström and S. Haridi p q prepare promise accepted decide 76

Optimization 2 d n Proposer On Propose(C): q q n n On Prepare, n,

Optimization 2 d n Proposer On Propose(C): q q n n On Prepare, n, l : q q q On Promise, n, n’, v’, l from q, n = nc: q q q q n If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, s : = [ ]N, lc : = 0, vc : = vd Send Prepare, nc, |vd| to acceptors Replica Add (n’, v’) to S ; s[q] : = l If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C For Each q s. t. s[q] : l : = s[q] ; s[q] : = |vc| Send Accept, nc, suffix(vc, l), l to q Else If |S|> (N+1)/2 : Send Accept, nc, suffix(vc, l), l to q s[q] : = |vc| If lc 0: Send Decide, nc, lc to q n On Accept, n, vs, offs : q q q n If np < n: np : = n Send Promise, n, na, suffix(va, l), |vd| to prop. If np = n: // cannot be < due to FIFO na : = n If offs < |va|: va : = prefix(va, offs) va : = va ++ vs Send Accepted, n, |va| to prop. On Decide, n, l : q q q If np = n and |vd| < l: vd : = prefix(va, l) Trigger Decide(vd) On Accepted, n, l from q, n = nc: q … 18 September 2021 77

Optimization 3 a vd as prefix of va n n n Each replica stores

Optimization 3 a vd as prefix of va n n n Each replica stores both va and vd, even though they are highly redundant We saw that, because of FIFO links, it always holds that vd is a prefix of va Therefore sequence vd can be replaced with an integer ld, such that vd = prefix(va, ld) 18 September 2021 N. Ekström and S. Haridi 78

Optimization 3 a n Proposer On Propose(C): q q n n q q q

Optimization 3 a n Proposer On Propose(C): q q n n q q q Add (n’, v’) to S ; s[q] : = l If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C For Each q s. t. s[q] : l : = s[q] ; s[q] : = |vc| Send Accept, nc, suffix(vc, l), l to q Else If |S|> (N+1)/2 : Send Accept, nc, suffix(vc, l), l to q s[q] : = |vc| If lc 0: Send Decide, nc, lc to q On Prepare, n, l : q q q On Promise, n, n’, v’, l from q, n = nc: q n If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, s : = [ ]N, lc : = 0, vc : = prefix(va, ld) Send Prepare, nc, ld to acceptors Replica n On Accept, n, vs, offs : q q q n If np < n: np : = n Send Promise, n, na, suffix(va, l), ld to prop. If np = n: na : = n If offs < |va|: va : = prefix(va, offs) va : = va ++ vs Send Accepted, n, |va| to prop. On Decide, n, l : q q q If np = n and ld < l: ld : = l Trigger Decide(prefix(va, ld)) On Accepted, n, l from q, n = nc: q … 18 September 2021 79

Optimization 3 b n n n It is possible to remove the need to

Optimization 3 b n n n It is possible to remove the need to store the sequences vc and va separately By updating the local replica directly instead of sending messages to itself it is possible to merge vc into va We don’t show the details, but left as a (fun) exercise to the student 18 September 2021 N. Ekström and S. Haridi 80

Deliver One Command At A Time n n Currently every decided sequence is handed

Deliver One Command At A Time n n Currently every decided sequence is handed to the application in its entirety Probably makes more sense to change the API and decide one command at a time n On Decide, n, l : q q 18 September 2021 If np = n: While ld < l: Trigger Decide(va[ld]) ld : = ld + 1 N. Ekström and S. Haridi 81

Final algorithm n Proposer On Propose(C): q q n n q q q Add

Final algorithm n Proposer On Propose(C): q q n n q q q Add (n’, v’) to S ; s[q] : = l If |S|= (N+1)/2 : (_, vs) : = max(S) vc : = vc ++ vs vc : = vc if C in vc else vc ++ C For Each q s. t. s[q] : l : = s[q] ; s[q] : = |vc| Send Accept, nc, suffix(vc, l), l to q Else If |S|> (N+1)/2 : Send Accept, nc, suffix(vc, l), l to q s[q] : = |vc| If lc 0: Send Decide, nc, lc to q On Prepare, n, l : q q q On Promise, n, n’, v’, l from q, n = nc: q n If nc 0 then skip prepare phase nc : = unique proposal number S : = , a : = [0]N, s : = [ ]N, lc : = 0, vc : = prefix(va, ld) Send Prepare, nc, ld to acceptors Replica n On Accept, n, vs, offs : q q q n If np < n: np : = n Send Promise, n, na, suffix(va, l), ld to prop. If np = n: na : = n If offs < |va|: va : = prefix(va, offs) va : = va ++ vs Send Accepted, n, |va| to prop. On Decide, n, l : q q q If np = n and ld < l: ld : = l Trigger Decide(prefix(va, ld)) On Accepted, n, l from q, n = nc: q … 18 September 2021 82