Practical Byzantine Fault Tolerance Written by Miguel Castro

Practical Byzantine Fault Tolerance Written by: Miguel Castro and Barbara Liskov Presented by: Leo Waugh

Agenda • • • Background The Algorithm Experiments Conclusion My Evaluation Discussion

Background (Motivation) • Previous Algorithms Not Feasible – Inefficient / Extreme Overhead – Assume Synchrony • Malicious Attacks • Software Errors

Background (Assumptions) • • Independent Node Failures Compromised Nodes Can’t Forge Msg Nodes Cannot be Delayed Forever Network Issues Exist – Loss – Delay – Duplicate Messages – Out of Order Messages

Background (Model) • Asynchronous Distributed System • 3 f+1 Nodes Connected by Network • Byzantine Failure Model (arbitrary failure) Clients Requests Answers System Node System Node Primary

Background (Properties) • • • Safety Guaranteed Liveness Not Theoretical, but Practical Resiliency Optimal View Write 4 Atomic Multicast Write 3 – All Do or Do Not – All Agree on Order X 3 Y 3 4 Z 4 3

The Algorithm Four Step Algorithm 1) 2) 3) 4) Client Sends Request to Primary Multicasts Request to Replicas Execute and Reply to Client Waits for an Answer Ø If it gets f+1 identical answers, it is done Ø If slow response, client multicasts replicas

The Algorithm (Step 1) • Client Request to Primary – <REQUEST, operation, timestamp, client> signature Client Primary Worker Node

The Algorithm (Step 2 a) • Primary Multicasts Request to Replicas – < <PRE-PREPARE, view, sequence number, digest> signature, client’s request message> Client Primary Worker Node

The Algorithm (Step 2 b) • Replicas Validate Pre-Prepare via Prepare – <PREPARE view, sequence number, message digest, replica ID>signature Client Primary Worker Node

The Algorithm (Step 2 c) • When Prepared, Replicas Multicast Commit – <COMMIT, view, sequence number, digest of message m, replica ID>signature Client Primary Worker Node

The Algorithm (Step 3) • When Replicas are Committed, Execute & Reply – <REPLY view, timestamp, client, replica ID, result> Client Primary Worker Node

The Algorithm (Step 4) • Client Node Checks Replies – F+1 Identical Results Indicates Completion – If Slow Response, Multicast Replicas Client Primary Worker Node

The Algorithm (other factors) • Garbage Collection – Must store log for view changes – Sets Stable Checkpoints on agreement – Needs at least 1 Stable Checkpoint • View Changes – When Primary Crashes or Exhibits Errors – New Primary Chosen from Replicas – Operation Continues with new View

The Algorithm (optimizations) • Cryptography – Digital Signature Reduction – Message Authentication Codes • Reducing Communication – Client receives one result – Replicas receive one message – Read only requests Pre-executed

Experiments (Setup) • Byzantine File System

Experiments (Micro) • Micro Benchmark – Service Independent Library Evaluation – Represents Worst Case Overhead Argument / Result (KB) Replicated Read-write Read-only Without Replication 0/0 3. 35 (309%) 1. 62 (98%) 0. 82 4/0 14. 19 (207%) 6. 98 (51%) 4. 62 0/4 8. 01 (72%) 5. 94 (27%) 4. 66

Experiments (Andrew) BFS phase Strict r/o lookup NFS-std 1 0. 55 (-69%) 0. 47 (-73%) 1. 75 2 9. 24 (-2%) 7. 91 (-16%) 9. 46 3 7. 24 (35%) 6. 45 (20%) 5. 36 4 8. 77 (32%) 7. 87 (19%) 6. 60 5 38. 69 (-2%) 38. 38 (-2%) 39. 35 total 64. 48 (3%) 61. 07 (-2%) 62. 52

Conclusion • New State Machine Replication Algorithm – Tolerates Byzantine Faults – Practical / Efficient • Byzantine File System – Implements Real Services – Cost of 3% vs. Non-Replicated NFS

My Evaluation • Contents – Too Many Details for Paper Size – Very Thorough in Discussion • Presentation – Information Out of Order • Wider Context – Offers Strong Fault Tolerance – Price is Lowest Yet

Questions? • • • Background The Algorithm Experiments Conclusion My Evaluation Questions