Distributed Algorithms 22903 The waitfree hierarchy and the
Distributed Algorithms (22903) The wait-free hierarchy and the universality of consensus Lecturer: Danny Hendler This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch
2
3
4
Formally: the Consensus Object -Supports a single operation: decide -Each process pi calls decide with some input vi from some domain. decide returns a value from the same domain. -The following requirements must be met: - Agreement: In any execution E, all decide operations must return the same value. - Validity: The values returned by the operations must equal one of the inputs. 5
Wait-free consensus can be solved easily by compare&swap Comare&swap(b, old, new) atomically v read from b if (v = old) { b new return success } else return failure; Motorola 680 x 0 IBM 370 Sun SPARC 80 X 86 MIPS Power. PC DECAlpha How? 6
Would this consensus algorithm from reads/writes work? Initially decision=null Decide(v) ; code for pi, i=0, 1 1. 2. 3. 4. 5. if (decision = null) decision=v return v else return decision 7
A proof that wait-free consensus for 2 or more processes cannot be solved by registers. 8
A FIFO queue Supports 2 operations: • q. enqueue(x) – returns ack • q. dequeue – returns the first item in the queue or empty if the queue is empty. 9
FIFO queue + registers can implement 2 -process consensus Initially Q=<0> and Prefer[i]=null, i=0, 1 Decide(v) ; code for pi, i=0, 1 1. Prefer[i]: =v 2. qval=Q. deq() 3. if (qval = 0) then return v 4. else return Prefer[1 -i] There is no wait-free implementation of a FIFO queue shared by 2 or more processes from registers 10
A proof that wait-free consensus for 3 or more processes cannot be solved by FIFO queue (+ registers) 11
The wait-free hierarchy We say that object type X solves wait-free n-process consensus if there exists a wait-free consensus algorithm for n processes using only shared objects of type X and registers. The consensus number of object type X is n, denoted CN(X)=n, if n is the largest integer for which X solves wait-free n-process consensus. It is defined to be infinity if X solves consensus for every n. Lemma: If CN(X)=m and CN(Y)=n>m, then there is no wait-free implementation of Y from instances of X and registers in a system with more than m processes. 12
The wait-free hierarchy (cont’d) registers 1 FIFO queue, stack, 2 test-and-set … Compare-and-swap 13
The universality of conensus An object is universal if, together with registers, it can implement any other object in a wait-free manner. We will show that any object X with consensus number n is universal in a system with n or less processes An algorithm is lock-free if it guarantees that some operation terminates after some finite total number of steps performed by processes. The lock-freedom progress property is weaker than wait-freedom. 14
Universal constructions Given the sequential specification of any object, implement a linearizable wait-free concurrent version of it: • A lock-free construction using CAS • A lock-free construction using consensus • A wait-free construction using consensus • A bounded-memory wait-free construction using consensus 15
A lock-free universal algorithm using CAS Each operation is represented by a shared record of type opr. typedef opr structure { inv ; the operation invocation, including its parameters new-state ; the new state of the object, after applying the operation response ; The response of the operation before ; A pointer to the record of the previous operation on the object Head inv new-state response before … inv new-state response before 16
A lock-free universal algorithm using CAS (cont’d) Head inv new-state response … anchor inv new-state=init response Initially Head points to the anchor record. Head. newstate is initialized with the implemented object’s initial state. 1. 2. 3. 4. 5. 6. 7. When inv occurs point: =new opr, point. inv: =inv repeat h: =Head point. new-state, point. response=apply(inv, h. new-state) until compare&swap(Head, h, point)=h return point. response 17
A lock-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ; the operation’s sequential number (register) inv ; the operation invocation, including its parameters (register) new-state ; the new state of the object, after applying the operation (register) response ; The response of the operation, including its return value (register) after ; A pointer to the next record (consensus object) Head seq inv new-state response after … anchor seq=1 inv=null new-state=init response=null after 18
A lock-free universal algorithm using consensus (cont’d) Head seq inv new-state response after … anchor seq=1 inv=null new-state=init response=null after Initially all Head entries points to the anchor record. 1. When inv occurs 2. point: =new opr, point. inv: =inv 3. for j=0 to n-1 ; find a record with the maximum sequenece number 4. if Head[j]. seq > Head[i]. seq then Head[i]=Head[j] 5. repeat 6. win: =decide(Head[i]. after, point) ; try to thread your operation 7. win. seq: =Head[i]. seq+1 8. < win. new-state, win. response > : =apply(win. inv, Head[i]. new-state) 9. Head[i]=win ; point to the following record 10. until win=point 11. return point. response 19
A wait-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ; the operation’s sequential number (register) inv ; the operation invocation, including its parameters (register) new-state ; the new state of the object, after applying the operation (register) response ; The response of the operation, including its return value (register) after ; A pointer to the next record (consensus object) We add a helping mechanism Announce seq inv new-state response after When performing operation with sequence number j, try to help process (j mod n)20
A wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. 1. When inv occurs 2. Announce[i]: =new opr, Announce[i]. inv: =inv, Announce[i]. seq: =0 3. for j=0 to n-1 ; find a record with the maximum sequenece number 4. if Head[j]. seq > Head[i]. seq then Head[i]=Head[j] 5. while Announce[i]. seq=0 do 6. priority: =Head[i]. seq+1 mod n ; ID of process with priority 7. if Announce[priority]. seq=0 ; If help is needed 8. then point: =Announce[priority] ; help the other process 9. else point: =Announce[i] ; perform own operation 10. win: =decide(Head[i]. after, point) 11. < win. new-state, win. reponse > : =apply(win. inv, Head[i]. new-state) 12. win. seq: =Head[i]. seq+1 13. Head[i]=win 14. return Announce[i]. reponse 21
A proof that the universal algorithm using consensus is wait-free 22
A bounded-memory wait-free universal algorithm using consensus What is the number of records needed by the algorithm? Unbounded! The following algorithm uses a bounded # of records • Each process allocates records from its private pool • A record is recycled once we’re sure it will not be referenced anymore • We don’t need this mechanism if we use a language with a GC (such as Java) 23
A bounded-memory wait-free universal algorithm using consensus (cont’d) When can we recycle record #k? No process trying to thread record (k+n+1) or higher will access record k. After all the processes that thread records k…k+n terminate, record k can be freed. When process p finishes threading record m it releases records m-1…m-n. After record k is released by the operations threading records k+1…k+n – it can be recycled 24
A bounded-memory wait-free universal algorithm using consensus: data structures Each operation is represented by a shared record of type opr. typedef opr structure { seq ; the operation’s sequential number (register) inv ; the operation invocation, including its parameters (register) new-state ; the new state of the object, after applying the operation (register) response ; The response of the operation, including its return value (register) after ; A pointer to the next record (consensus object) before ; A pointer to the previous record released[1. . n] initially true released Head seq inv new-state response before after … anchor seq inv new-state response before after 25
A bounded-memory wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. 1. 2. When inv occurs point: =a free record from private pool, point. inv: =inv, point. seq: =0 for r: =1 to n do point. released[r]: =false, Announce[i]: =point 3. for j=0 to n-1 ; find a record with the maximum sequenece number 4. if Head[j]. seq > Head[i]. seq then Head[i]=Head[j] 5. while Announce[i]. seq=0 do 6. priority: =Head[i]. seq+1 mod n ; ID of process with priority 7. if Announce[priority]. seq=0 ; If help is needed 8. then point: =Announce[priority] ; help the other process 9. else point: =Announce[i] ; perform own operation 10. win: =decide(Head[i]. after, point) 11. < win. new-state, win. reponse > : =apply(win. inv, Head[i]. new-state) 12. win. before: =Head[i] 13. win. seq: =Head[i]. seq+1 14. Head[i]=win 15. temp: =Announce[i]. before 16. for r: =1 to n do 17. if temp<> anchor then 26 18. before-temp: =temp. before, temp. released[r]: =true, temp: = before-temp 19. return Announce[i]. response
How many records are required by the algorithm? Each incomplete operation may waste n distinct records There may be up to n incomplete operations At any point in time, up to n 2 non-recycable records All non-recycable records may belong to same process! Each pool should have O(n 2) records, O(n 3) total records 27 needed
- Slides: 27