A World of ImPossibilities Nancy Lynch Celebration Sixty
A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University World of (Im)Possibilities. PODC 2008 PODC/Concur 2008
Introduction n n One of the main themes of Nancy's work has been proving lower bounds and impossibility results for problems that arise in distributed computing. Overview some of Nancy's results q n Less known results, hidden gems closer to our hearts Emphasize their meaning and implications q q How they influenced the development of the field and of distributed systems Concentrating on their positive impact PODC/Concur 2008 World of (Im)Possibilities 2
Best-Known Example: FLP Impossibility of asynchronous fault-tolerant consensus [Fischer, Lynch, Paterson] Motivated work on n strengthening models of computation q q n partially synchronous models unreliable failure detectors [Dwork, Lynch, Stockmeyer] [Chandra, Toueg] weakening the problem definition q q q k-set agreement renaming condition-based approaches PODC/Concur 2008 World of (Im)Possibilities [Chaudhuri] [Attiya et al. ] [Raynal, Rajsbaum et al. ] 3
FLP: Impact n Related practical problems: q q n n transaction commit leader election atomic broadcast maintaining consistent replicated data The wait-free hierarchy (classify concurrent abstract data types) [Herlihy] Attempts to solve k-set agreement and renaming led to the application of topology in distributed computing. [Chaudhuri] [Borowsky, Gafni][Saks, Zaharoglou][Herlihy, Shavit] PODC/Concur 2008 World of (Im)Possibilities 4
nd 2 Example: Brewer's Conjecture [Brewer, PODC 2000 invited talk] A web service cannot provide all three guarantees: q Consistency q Availability q Partition-tolerance PODC/Concur 2008 World of (Im)Possibilities 5
What Does This Mean? [Gilbert, Lynch, SIGACT News 2002] A web service cannot provide all three guarantees: q Consistency: atomicity of (read / write) operations q Availability: request by nonfaulty client gets response q Partition-tolerance: even when lost messages create two partitioned components in the network PODC/Concur 2008 World of (Im)Possibilities 6
Proof Idea adapted from [Attiya, Bar-Noy, Dolev] p 0 Exec 1: X X X p 1 X p 0 writes 1 p 1 reads 0 Exec 2: Exec 3: PODC/Concur 2008 p 0 writes 1 p 1 reads 0 World of (Im)Possibilities look same to p 1 7
Brewer's Conjecture: Implications Traditional database services maintain the consistency and fail to provide availability in the face of partitions G Relax the consistency guarantees of the web service n q Sometimes miss values or return stale data (Internet queries) [PIER: Huebsch, Hellerstein, Lanham, Loo, Shenker, Stoica] q Allow partitions to evolve separately, and build mechanisms to cope when this happens (stream processing) [Medusa: Balazinska, Balakrishnan, Stonebraker] G Sacrifice availability, but not often (stream processing)… [BOREALIS: Balazinska, Balakrishnan, Madden, Stonebraker] G Assume a mechanism to guard against partitions… [CQ: Shah, Hellerstein, Brewer] PODC/Concur 2008 World of (Im)Possibilities 8
3 rd Example: Best-Case Cost of Fault-Tolerant Algorithms n Does making an algorithm be fault-tolerant incur a cost even when the system is well-behaved? Previous investigation focused on the synchronous case q early stopping algorithms for consensus: 2 rounds vs. 1 round for non-fault-tolerant algorithm [Dolev, Reischuk, Strong] [Dwork, Moses] [Moses, Tuttle] q non-blocking commit: twice as many rounds as for blocking commit [Dwork, Skeen] F What about the asynchronous case? PODC/Concur 2008 World of (Im)Possibilities 9
Are Wait-Free Algorithms Fast? [Attiya, Lynch, Shavit] n Studies the best-case complexity of an algorithm q q n Complexity measure of interest is running time q n When there are no failures, although algorithm can tolerate any number of crashes (is wait-free) When the execution is synchronized, although the algorithm works in asynchronous executions also Time is measured by synchronized rounds Problem of interest is approximate agreement n=6 PODC/Concur 2008 World of (Im)Possibilities 10
Wait-Free Algorithms are not Fast n A non-fault-tolerant algorithm takes O(1) time q q n one process writes its input and the rest read it achieves perfect agreement ( = 0) Prove an Ω(log n) time lower bound for wait-free approximate agreement G So there are problems for which being wait-free in the asynchronous model imposes more than constant additional cost even when failures do not occur. PODC/Concur 2008 World of (Im)Possibilities 11
Proof Idea this process cannot influence the decision 0 0 0 <n 0 0 decide 0 < log n PODC/Concur 2008 World of (Im)Possibilities 12
Proof Idea 1 <1 decide 1 0 0 <n 0 0 decide 0 < log n PODC/Concur 2008 World of (Im)Possibilities 13
The Best-Case Cost of Fault-Tolerance n Formalize the idea of "designing for the normal / common case" and show its cost [Lampson, "Hints for computer system design"] n The idea of accommodating the worst case & measuring the best / normal / common case has become standard. q q q message cost of consensus in failure-free runs [Halpern, Hadzilacos] contention-free step complexity [Alur, Taubenfeld] obstruction-free step complexity [Ellen, Luchangco, Moir, Shavit] PODC/Concur 2008 World of (Im)Possibilities 14
Interleaving Algorithms n n Also an approximate agreement algorithm matching the (log n) time lower bound Interleaves two algorithms: One guarantees fault-tolerance q Another guarantees best-case time complexity q Need to coordinate results… F Using a “virtual” two-process approximate agreement algorithm q n Similar applications of interleaving, especially in randomized consensus q E. g. , this morning session PODC/Concur 2008 World of (Im)Possibilities [Saks, Shavit, Woll] [Aspnes, Attiya, Censor] 15
Application: Replicated Storage [Yu and Vahdat] n n Emulates a shared memory Replication-based implementation of wide-area data access services q n n need automatic regeneration of failed replicas and reconfiguration of groups Probabilistic guarantee: reads may return stale values with a small probability Optimizes for best case: q q Failure-free reconfiguration is quick and cheap Failure-induced calls a consensus protocol [Saks, Shavit, Woll] for replicas to agree on next configuration PODC/Concur 2008 World of (Im)Possibilities 16
th 4 n Example: Clock Synchronization In a distributed system with n nodes that experiences variable message delays, how closely can the nodes' clocks be synchronized? PODC/Concur 2008 World of (Im)Possibilities 17
Clock Synchronization Lower Bound [Lundelius, Lynch] n No algorithm can synchronize n clocks closer than (1 -1/n)u For a clique with same message delay uncertainty u on all links (u = max delay - min delay) q n Even if no failures and no clock drift Proof introduced the shifting technique p 0 shift p 0 backwards by u d-u p 1 PODC/Concur 2008 d p 0 p 1 World of (Im)Possibilities d d-u 18
What About Other Topologies? [Halpern, Megiddo, Munshi] n n Arbitrary topologies and nonuniform uncertainties Adversary's optimal strategy is to maximize a certain quantity q q n involving neighboring nodes' initial clock values and the delays between them subject to constraints on message uncertainty Bound is expressed as a system of equations, and this linear program is solved using optimization techniques q q Shifting notion is captured in the linear program Not in closed form except for a few special cases G Bound is tight PODC/Concur 2008 World of (Im)Possibilities 19
What About Closed Form Bounds? [Biaz, Welch] n If uncertainties are symmetric (same in both directions of a link), then lower bound is diam/2 where diam is diameter of the graph w. r. t. uncertainties c d b 1 2 3 a 3 4 e PODC/Concur 2008 5 2 diam = 9 4 5 World of (Im)Possibilities f 20
Shifting Equivalent Clique n Arbitrary topology G with arbitrary uncertainties is equivalent to clique G' with same nodes where uncertainty between any two nodes is length of shortest path between them in G (w. r. t. uncertainties) [Halpern, Megiddo, Munshi] 3 a 9 n f Shift a carefully chosen execution on the clique, for 2 nodes diam apart to get the diam/2 lower bound. PODC/Concur 2008 4 6 5 b 6 4 2 5 e World of (Im)Possibilities 3 3 5 3 2 c 1 d 21
What About Upper Bounds? n n For arbitrary graph and arbitrary topology, the radius is an upper bound [Halpern, Megiddo, Munshi] Since radius ≤ diam, within factor of 2 c d b 1 2 3 a 3 4 e n 5 2 diam = 9 radius = 5 4 5 f Tight & almost tight closed form upper bounds for some specific common topologies with uniform uncertainties [Biaz, Welch] PODC/Concur 2008 World of (Im)Possibilities 22
External Clock Synchronization n What about external synchronization, when some clocks have outside time sources? q n Previous results for internal synchronization The tight bound on how close a node's clock can get to the source time is half the shortest path distance (w. r. t. uncertainties) from the node to a source c b 1 2 source a 3 3 4 PODC/Concur 2008 d source 5 2 4 5 World of (Im)Possibilities f [Attiya, Hay, Welch] bounds are: b: 3/2 c: 1/2 e: 3/2 f: 5/2 23
Optimal Synchronization Per Execution n Given information collected in a specific execution, by some algorithm strategy, find the tightest possible synchronization q internal synchronization, offline algorithm [Attiya, Herzberg, Rajsbaum] q external synchronization, online algorithm [Patt-Shamir, Rajsbaum] q extended to handle clock drift [Ostrovsky, Patt-Shamir] PODC/Concur 2008 World of (Im)Possibilities 24
Gradient Clock Synchronization G The clock skew between any pair of nodes should be a function of the distance between them [Fan, Lynch] c b d clocks of a and d need not be as tightly synch'ed as those of a and b a f e PODC/Concur 2008 World of (Im)Possibilities 25
Gradient Clock Synchronization n motivated by problems in sensor networks, or more generally, large scale networks, where nodes in the same locality need to be more tightly synchronized q q data fusion target tracking http: //www. mikalac. com/missile. html PODC/Concur 2008 World of (Im)Possibilities 26
Gradient Clock Synch Lower Bound n Closest that two nodes' clocks can get (in worst case) is (log D / log D) q D is diameter of network global influence L Algorithms requiring a fixed maximum skew for nearby nodes may not scale well q E. g. , TDMA http: //www. dsna-dti. aviation-civile. gouv. fr/actualities /revuesgb/revue 64 gb/64 pgarticle 2 gb/telecom_c 2 gb. html PODC/Concur 2008 World of (Im)Possibilities 27
Gradient Clock Synch Lower Bound: Assumption 1 Nonzero clock drift: (hardware) clocks can run fast or slow, within known bounds hardware clock 1+ clock time max slope < 1+ min slope < (1+ )-1 real time PODC/Concur 2008 World of (Im)Possibilities 28
Gradient Clock Synch Lower Bound: Assumption 2 Algorithm must ensure that (logical) clocks always increase at some minimum positive rate logical clock time min slope < real time PODC/Concur 2008 World of (Im)Possibilities 29
Gradient Clock Synch LB: Simple Case p 1 n n p 2 p 3 pn Consider a simple algorithm in which the clock value of p 1 is periodically propagated down the chain Can construct execution in which pn-1's new clock value is larger than pn's old clock value by an amount depending on D q q q carefully choose message delays manipulate clock drift rates cause nodes to suddenly jump to higher values without synchronizing with their neighbors G Insight in the paper is generalizing this to any algorithm PODC/Concur 2008 World of (Im)Possibilities 30
Is the Lower Bound Tight? n n n Recall lower bound is (log D / log D) Several pre-existing algorithms have O(D) Then upper bound improved to O(√D) [Locher, Wattenhofer] n Recently upper bound improved to O(log D) [Lenzen, Locher, Wattenhofer] F Still a small gap; can the lower bound be improved? PODC/Concur 2008 World of (Im)Possibilities 31
How Long Can Large Difference Last? n n In the simple diffusion algorithm on the chain, large difference between pn-1 and pn only lasts while message is in transit Perhaps difficulties could be avoided by keeping track of “generation” of clock value and only comparing apples with apples (clocks of the same generation)? q but this could be complicated PODC/Concur 2008 World of (Im)Possibilities 32
And There’s a Lot More… n Lower bounds on space for mutual exclusion [Burns, Lynch] n Lower bound on number of messages for leader election in synchronous rings [Frederickson, Lynch] n Impossibility results for data link layer and connection management n Lower bound on time for consensus in partially synchronous models [Fekete, Lynch, Mansour, Spinelli] [Kleinberg, Attiya, Lynch] [Attiya, Dwork, Lynch, Stockmeyer] n Lower bound on time for synchronous k-set agreement n Tradeoff between safety and liveness for randomized coordinated attack [Chaudhuri, Herlihy, Lynch, Tuttle] [Varghese, Lynch] n Impossibility of boosting fault tolerance n … [Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum] PODC/Concur 2008 World of (Im)Possibilities 33
Final Observations n Strive to make the results relevant q q q n Natural problems Practical architectural assumptions Realistic performance measures (for lower bounds) Crisp arguments (ingenious but clear) q q Easy to understand verify Simple to extend and lead to follow-ups PODC/Concur 2008 World of (Im)Possibilities 34
Take-Home Message n n n Impossibility results help the development of the area Understanding inherent limits guides efforts in the appropriate directions And setting boundaries is good for everyone… PODC/Concur 2008 World of (Im)Possibilities 35
Thanks for your attention Thank you, Nancy! World of (Im)Possibilities. PODC 2008 PODC/Concur 2008
- Slides: 36