Paxos Commit Jim Gray Leslie Lamport Microsoft Research

  • Slides: 28
Download presentation
Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation

Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond, WA Article MSR-TR-2003 -96 Consensus on Transaction Commit http: //research. microsoft. com/research/pubs/view. aspx? tr_id=701 11/5/2020 1

Commit is Common • Marriage ceremony Do you? I do. I now pronounce you…

Commit is Common • Marriage ceremony Do you? I do. I now pronounce you… • Theater Ready on the set? Ready! Action! • Contract law Offer Signature Deal / lawsuit 11/5/2020 2

The Common Picture director Ready Action! Ready? actors Action! Ready? actors Ready? Ready Action!

The Common Picture director Ready Action! Ready? actors Action! Ready? actors Ready? Ready Action! 11/5/2020 3

All or Nothing: If any actor says no the deal is off. director No

All or Nothing: If any actor says no the deal is off. director No deal! Ready? Ready No deal! Ready? No! No deal! Ready? Ready actors Ready? Ready No deal! 11/5/2020 No! or timeout 4

The Database Version director TM client director RM actors RM Commit Ready? Ready Commit

The Database Version director TM client director RM actors RM Commit Ready? Ready Commit 11/5/2020 Commit TM: Transaction Manager 5 RM: Resource Manager

Two Phase Commit • N Resource Managers (RMs) • Want all RMs to commit

Two Phase Commit • N Resource Managers (RMs) • Want all RMs to commit or all abort. • Coordinated by Transaction Manager (TM) TM sends Prepare, Commit-Abort • RM responds Prepared, Aborted Request. Commit Prepare • 3 N+1 messages Prepare Prepared • N+1 stable writes Prepare • Delay Commit – 4 message – 2 stable write • Blocking: if TM fails, Commit-Abort stalls 11/5/2020 Resource Manager Transaction Manager working prepared committed aborted committed 6 aborted

The Problem With 2 PC • Atomicity – all or nothing • Consistency –

The Problem With 2 PC • Atomicity – all or nothing • Consistency – does right thing • Isolation – no concurrency anomalies • Durability / Reliability – state survives failures • Availability: always up Blocks if TM fails 11/5/2020 7

Problem Statement • ACID Transactions make error handling easy. • One fault can make

Problem Statement • ACID Transactions make error handling easy. • One fault can make 2 -Phase Commit block. • Goal: ACID and Available. Non-blocking despite F faults. 11/5/2020 8

Fault-Tolerant Two Phase Commit Prepared Request. Commit client TM Prepare Pre par Prep ed

Fault-Tolerant Two Phase Commit Prepared Request. Commit client TM Prepare Pre par Prep ed are TM Request. Commit RM RM Prepared If the 2 PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit) 11/5/2020 9

Fault-Tolerant Two Phase Commit client TM abort RM aitred pre ream P p m

Fault-Tolerant Two Phase Commit client TM abort RM aitred pre ream P p m e r P co com mit TM Prepared commit RM Prep a re Request. Commit Prepared Prepare commit Prepared commit Inconsistent! Now What? abort If the 2 PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit) But… What if…. ? The complexity is a mess. 10 11/5/2020

Fault Tolerant 2 PC • Several workarounds proposed in database community: • Often called

Fault Tolerant 2 PC • Several workarounds proposed in database community: • Often called "3 -phase" or "non-blocking" commit. • None with complete algorithm and correctness proof. 11/5/2020 11

“Reaching Agreement in the Presence of Faults” Shostak, Pease, & Lamport JACM, 1980 •

“Reaching Agreement in the Presence of Faults” Shostak, Pease, & Lamport JACM, 1980 • 25 years of theory • Now called the Consensus problem • N processes want to agree on a value, even if F of them have failed. 11/5/2020 12

Consensus Propose X client 11/5/2020 W Chosen Propose W W Chosen consensus box W

Consensus Propose X client 11/5/2020 W Chosen Propose W W Chosen consensus box W Chosen • collects proposed values • Picks one proposed value • remembers it forever 13

Consensus for Commit The Obvious Approach client Request Commit TM Request. Commit Prepared RM

Consensus for Commit The Obvious Approach client Request Commit TM Request. Commit Prepared RM sen Cho Prepared Propose Commit Prep Comare mit consensus box RM Prepared Chosen Prepared Propose Prepared Chosen Commit • Get consensus on TM’s decision. • TM just learns consensus value. • TM is “stateless” 11/5/2020 14

Consensus for Commit The Paxos Commit Approach client Request Commit TM Commit Prepare RM

Consensus for Commit The Paxos Commit Approach client Request Commit TM Commit Prepare RM Propose RM 1 Prepared Commit RM 1 Prepared Chosen Prep consensus Comare mit box TM Request. Commit RM 2 RM Propose RM 2 Prepared Prep consensus ared Cho box sen Prepare Propose RM 1 Prepared Propose RM 2 Prepared RM 1 Prepared Chosen RM 2 Prepared Chosen Commit • Get consensus on each RM’s choice. • TM just combines consensus values. • TM is “stateless” 11/5/2020 15

The Obvious Approach Paxos Commit One fewer message delay Prepared Propose RM 1 Prepared

The Obvious Approach Paxos Commit One fewer message delay Prepared Propose RM 1 Prepared Propose RM 2 Prepared Propose Prepared RM 1 Prepared Chosen RM 2 Prepared Chosen Commit 11/5/2020 Commit 16

Consensus in Action RM TM RM Prepare d Chosen Propose RM Prep ared Prop

Consensus in Action RM TM RM Prepare d Chosen Propose RM Prep ared Prop ose RM Pr. P Pro reapared e M R p te o po. V red se RM Vote RPMre. Prep pa ared Vote RM P repa red TM • The normal (failure-free) case • Two message delays • Can optimize 11/5/2020 Consensus box acceptor 17

Consensus in Action RM Consensus box acceptor TM can always learn what was chosen,

Consensus in Action RM Consensus box acceptor TM can always learn what was chosen, or get Aborted chosen if nothing chosen yet; if majority of acceptors working. 11/5/2020 18

The Complete Algorithm • Subtle. • More weird cases than most people imagine. •

The Complete Algorithm • Subtle. • More weird cases than most people imagine. • Proved correct. 11/5/2020 19

Paxos Commit • N RMs • 2 F+1 acceptors (~2 F+1 TMs) • If

Paxos Commit • N RMs • 2 F+1 acceptors (~2 F+1 TMs) • If F+1 acceptors see all RMs prepared, then transaction committed. • 2 F(N+1) + 3 N + 1 messages 5 message delays 2 stable write delays. Client TM request commit RM 1…N Acceptors 0… 2 F prepared d all prepare 11/5/2020 commit 20

Two-Phase Commit Paxos Commit tolerates F faults • 3 N+1 messages • 3 N+

Two-Phase Commit Paxos Commit tolerates F faults • 3 N+1 messages • 3 N+ 2 F(N+1) +1 messages • N+1 stable writes • N+2 F+1 stable writes • 4 message delays • 5 message delays • 2 stable-write delays Same algorithm when F=0 and TM = Acceptor 11/5/2020 21

Summary • Commit is common • Two Phase commit is good but… It is

Summary • Commit is common • Two Phase commit is good but… It is the un-availability protocol • Paxos commit is non-blocking if there at most F faults. • When F=0 (no fault-tolerance), Paxos Commit == 2 PC 11/5/2020 22

11/5/2020 23

11/5/2020 23

Paxos Consensus • Group has a leader known to all – leader election is

Paxos Consensus • Group has a leader known to all – leader election is a subroutine • Process proposes a value v to leader. • Leader sends proposal (phase 2) (ballot, value) to all acceptors • Acceptors respond with: max(ballot, value) they have seen • If leader gets no higher ballot, and gets at least F+1 responses then leader can announce (ballot, value) • Full protocol 3 -phase • Phase 1: – Leader starts new ballot • Phase 2 – Leader proposes value • Phase 3 – If value accepted by F+1 then value is accepted. – If not, leader tries to get majority value accepted. 6 F+4 messages, 2 F+1 stable writes 4 message delays and 2 stable write delays 24 11/5/2020

Using Consensus Have a consensus for each RM Prepared Request. Commit client TM Commit

Using Consensus Have a consensus for each RM Prepared Request. Commit client TM Commit Request. Commit Prepare Pre Commit par Prep ed Com are mit TM RM RM consensus box Prepared Commit 11/5/2020 Commit 25

Propose X RM X Chosen Propose W TM X Chosen TM 11/5/2020 consensus box

Propose X RM X Chosen Propose W TM X Chosen TM 11/5/2020 consensus box X Chosen 26

Paxos Commit (success case) Request Commit Prepared All Prepared Commit Resource Managers Acceptors Commit

Paxos Commit (success case) Request Commit Prepared All Prepared Commit Resource Managers Acceptors Commit Leader working committed 11/5/2020 working prepared aborted All. Prepared aborted committed 27 aborted

Consensus • The distributed systems theory community has thought about this a lot. •

Consensus • The distributed systems theory community has thought about this a lot. • They call it Consensus: N processes want to agree on a value • Want to tolerate F faults – Tolerate F processes stopping – Tolerate F Messages delayed or lost • If there are fewer than F faults in a window Then consensus achieved. • Byzantine faults need 3 F “acceptors” • Benign faults need 2 F+1 “acceptors” stalls but safe if more than F faults 11/5/2020 28