Paxos Made Simple Jinghe Zhang Introduction n Lock

  • Slides: 24
Download presentation
Paxos Made Simple Jinghe Zhang

Paxos Made Simple Jinghe Zhang

Introduction n Lock is the easiest way to manage concurrency n n n Mutex

Introduction n Lock is the easiest way to manage concurrency n n n Mutex and semaphore. Read and write locks. In distributed system: n n No master for issuing locks. Failures.

Problem n How to reach consensus/data consistency in distributed system that can tolerate non-malicious

Problem n How to reach consensus/data consistency in distributed system that can tolerate non-malicious failures?

Requirements n Safety n n Only a value that has been proposed may be

Requirements n Safety n n Only a value that has been proposed may be chosen. Only a single value is chosen. A node never learns that a value has been chosen unless it actually has been. Liveness n n Some proposed value is eventually chosen. If a value has been chosen, a node can eventually learn the value

Paxos Properties n Paxos is an asynchronous consensus algorithm n Asynchronous networks n n

Paxos Properties n Paxos is an asynchronous consensus algorithm n Asynchronous networks n n No common clocks or shared notion of time (local ideas of time are fine, but different processes may have very different “clocks”) No way to know how long a message will take to get from A to B

Paxos Properties n Paxos is guaranteed safe. n Consensus is a stable property: once

Paxos Properties n Paxos is guaranteed safe. n Consensus is a stable property: once reached it is never violated; the agreed value is not changed.

Paxos Properties n Paxos is not guaranteed live. n n Consensus is reached if

Paxos Properties n Paxos is not guaranteed live. n n Consensus is reached if “a large enough subnetwork. . . is non-faulty for a long enough time. ” Otherwise Paxos might never terminate.

Paxos Algorithm n Key Assumptions: n n n Set of processes that run Paxos

Paxos Algorithm n Key Assumptions: n n n Set of processes that run Paxos is known apriori Processes suffer crash failures All processes have Greek names (but translate as “Fred”, “Cynthia”, “Nancy”…)

Paxos Algorithm n 3 roles n n n proposer acceptor Learner n n A

Paxos Algorithm n 3 roles n n n proposer acceptor Learner n n A node can act as more than one clients (usually 3). 2 phases n n Phase 1: Prepare request Response Phase 2: Accept request Response

Phase 1: (prepare request) (1) A proposer chooses a new proposal version number n

Phase 1: (prepare request) (1) A proposer chooses a new proposal version number n , and sends a prepare request (“prepare”, n) to a majority of acceptors: (a) Can I make a proposal with number n ? (b) if yes, do you suggest some value for my proposal?

Phase 1: (prepare request) (2) If an acceptor receives a prepare request (“prepare”, n)

Phase 1: (prepare request) (2) If an acceptor receives a prepare request (“prepare”, n) with n greater than that of any prepare request it has already responded, sends out (“ack”, n, n’, v’) or (“ack”, n, , ) (a) responds with a promise not to accept any more proposals numbered less than n. (b) suggest the value v of the highest-number proposal that it has accepted if any, else

Phase 2: (accept request) (3) If the proposer receives responses from a majority of

Phase 2: (accept request) (3) If the proposer receives responses from a majority of the acceptors, then it can issue an accept request (“accept”, n , v) with number n and value v: (a) n is the number that appears in the prepare request. (b) v is the value of the highest-numbered proposal among the responses

Phase 2: (accept request) (4) If the acceptor receives an accept request (“accept”, n

Phase 2: (accept request) (4) If the acceptor receives an accept request (“accept”, n , v) , it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

Learning the decision n Obvious algorithm: whenever acceptor accepts a proposal, respond to all

Learning the decision n Obvious algorithm: whenever acceptor accepts a proposal, respond to all learners (“accept”, n, v). n n n No Byzantine-Failures: Acceptors informs a distinguished learner and let the distinguished learner broadcast the result. Learner receives (“accept”, n, v) from a majority of acceptors, decides v, and sends (“decide”, v) to all other learners. Learners receive (“decide”, v), decide v

In Well-Behaved Runs 1 1 2 (“prepare”, 1). . . (“ack”, 1, n 1:

In Well-Behaved Runs 1 1 2 (“prepare”, 1). . . (“ack”, 1, n 1: proposer 1 -n: acceptors 1 -n: learners 1 1 1 2 2 . . . n n (“accept”, 1 , v 1). , ) (“accept”, 1 , v 1) decide v 1

Paxos is safe… n Intuition: n If a proposal with value v is decided,

Paxos is safe… n Intuition: n If a proposal with value v is decided, then every higher-numbered proposal issued by any proposer has value v. A majority of acceptors accept (n, v), v is decided next prepare request with Proposal Number n+1 (what if n+k? )

Safety (proof) n n Suppose (n, v) is the earliest proposal that passed. If

Safety (proof) n n Suppose (n, v) is the earliest proposal that passed. If none, safety holds. Let (n’, v’) be the earliest issued proposal after (n, v) with a different value v’!=v As (n’, v’) passed, it requires a major of acceptors. Thus, some process approve both (n, v) and (n’, v’), though it will suggest value v with version number k>= n. As (n’, v’) passed, it must receive a response (“ack”, n’, j, v’) to its prepare request, with n<j<n’. Consider (j, v’) we get the contradiction.

Liveness n Fischer-Lynch-Patterson (1985) n n No consensus can be guaranteed in an asynchronous

Liveness n Fischer-Lynch-Patterson (1985) n n No consensus can be guaranteed in an asynchronous communication system in the presence of any failures. Intuition: a “failed” process may just be slow, and can rise from the dead at exactly the wrong time.

Liveness n FLP tells us that it is impossible for an asynchronous system to

Liveness n FLP tells us that it is impossible for an asynchronous system to agree on anything with accuracy and liveness! n Liveness requires that agents are free to accept different values in subsequent rounds. n But: safety requires that once some round succeeds, no subsequent round can change it.

Liveness(cont. ) n Paper gives us a scenario with 2 proposers, and during the

Liveness(cont. ) n Paper gives us a scenario with 2 proposers, and during the scenario no decision can be made. n As the paper points out, selecting a distinguished proposer will solve the problem. n n “Leader election” But Paxos doesn’t block in case of a lost message n Phase 1 can start with new rank even if previous attempts never ended

Applications n n n Chubby lock service. Petal: Distributed virtual disks. Frangipani: A scalable

Applications n n n Chubby lock service. Petal: Distributed virtual disks. Frangipani: A scalable distributed file system.

Summary n Consensus is “impossible” n n n But this doesn’t turn out to

Summary n Consensus is “impossible” n n n But this doesn’t turn out to be a big obstacle We can achieve consensus with probability one in many situations Paxos is an example of a consensus protocol, very simple

If you are interested. . . n Lamport, Leslie (May 1998). "The Part. Time

If you are interested. . . n Lamport, Leslie (May 1998). "The Part. Time Parliament". ACM Transactions on Computer Systems 16 (2): 133– 169. (http: //research. microsoft. com/users/la mport/pubs/lamport-paxos. pdf)

Questions? Jenkins, if I want another yes-man, I’ll build one!

Questions? Jenkins, if I want another yes-man, I’ll build one!