Algorithms for Extracting Timeliness Graphs Carole Delporte LIAFA

  • Slides: 24
Download presentation
Algorithms for Extracting Timeliness Graphs Carole Delporte, LIAFA, Univ. D. Diderot Stéphane Devismes, VERIMAG,

Algorithms for Extracting Timeliness Graphs Carole Delporte, LIAFA, Univ. D. Diderot Stéphane Devismes, VERIMAG, Univ. J. Fourier Hugues Fauconnier, LIAFA, Univ. D. Diderot Mikel Larrea, University of the Basque Country Delporte-Gallet et al SIROCCO'2010 1

Goals (In partially synchronous distributed systems) o How to determine the timeliness relations between

Goals (In partially synchronous distributed systems) o How to determine the timeliness relations between processes? n That is, communication from p to q within a bounded delay o Determine? n Eventually all processes agrees on the timeliness of some links Delporte-Gallet et al SIROCCO'2010 2

Why? o For example n (leader) There exists a process that communicates in a

Why? o For example n (leader) There exists a process that communicates in a timely way with all others -> leader election n (tree) There exist timely paths from p to every other process -> routing n (ring) There exists at least one timely ring linking all correct -> ring overlay Delporte-Gallet et al SIROCCO'2010 3

Also… o Timeliness is often used to determine correct processes (p timely received messages

Also… o Timeliness is often used to determine correct processes (p timely received messages from q => q is correct) n Leader -> Ω n Tree-Routing -> The source is correct (Ω) n Ring -> exactly all correct processes (◊ P) (Failure Detectors) Delporte-Gallet et al SIROCCO'2010 4

Context… o Processes : timely n (bounds on the time to execute a step

Context… o Processes : timely n (bounds on the time to execute a step -> accurately measure the time) o Some process crashes n (correct / faulty) o Communication: fully connected graph o Communication: by messages n Reliable links (no message loss) Delporte-Gallet et al SIROCCO'2010 5

Timeliness o The link (p, q) is timely: n There exists an unknown bound

Timeliness o The link (p, q) is timely: n There exists an unknown bound D: any message sent at time t by p cannot be received by q after time t+D n (if (p, q) is not timely, the communication delays from p to q are unbounded) n (there exists an unknown bound eventually there exists an unknown bound) n (Timeliness is a property that is defined to a given run) Delporte-Gallet et al SIROCCO'2010 6

Recalls o In asynchronous systems, no hypothesis on the link timeliness o In synchronous

Recalls o In asynchronous systems, no hypothesis on the link timeliness o In synchronous systems, all links are timely o Asynchronous <-> no consensus o Synchronous <-> consensus o Partially synchronous: Some links are timely Delporte-Gallet et al SIROCCO'2010 7

Partially synchronous systems: o Examples: n There exists a process having all its outgoing

Partially synchronous systems: o Examples: n There exists a process having all its outgoing links timely n There exists a time from which all links are timely n Remark: in both cases, consensus is possible o (Ω can be implemented in the first one and ◊ P in the second one) Delporte-Gallet et al SIROCCO'2010 8

Timeliness: o The timeliness graph of a given run r: T(r)=<S, E> n Nodes:

Timeliness: o The timeliness graph of a given run r: T(r)=<S, E> n Nodes: correct processes n Oriented edges: (p, q) is an edge iff the link from p to q is timely in r Delporte-Gallet et al SIROCCO'2010 9

Basic tool: Watchdog o q can test the link from p to q: n

Basic tool: Watchdog o q can test the link from p to q: n p regularly sends "Alive” in the link (p, q) n q loads a timer of period T, if it does not receive "ALIVE” from p within T time, q blames (p, q) and increases T o If the link (p, q) is timely (and p correct) eventually, T is sufficiently large so that q never more blame (p, q) o If the link (p, q) is not timely (and q correct), q will blame (p, q) infinitely often o Timely link Finite number of blaming o (assumption: FIFO links) Delporte-Gallet et al SIROCCO'2010 10

Systems o G=<S, E> is compatible a with T(r)=<Sr, Er> n (1) S=Sr n

Systems o G=<S, E> is compatible a with T(r)=<Sr, Er> n (1) S=Sr n (2) All edges of E are timely in T(r). o A system X is defined by a set of timeliness graphs: n Let R(X) the set of run of X: r is in R(X) if there exists G in X that is compatible with T(r) Delporte-Gallet et al SIROCCO'2010 11

Some systems… o o o o ASYNC: G=<S, Æ> COMPLETE: all complete graphs STAR:

Some systems… o o o o ASYNC: G=<S, Æ> COMPLETE: all complete graphs STAR: all star graphs TREE: all out-trees RING: all rings SC: all strongly connected graphs PAIR: all cycles of two elements Delporte-Gallet et al SIROCCO'2010 12

Extraction o Examples: We want (when it is possible) n To build a star

Extraction o Examples: We want (when it is possible) n To build a star n To build a (out-) tree n To build a ring n Moreover, we want: o Only timely links o All nodes must be (or almost be) correct processes Delporte-Gallet et al SIROCCO'2010 13

Almost? o In the general case, it is not possible to ensure that all

Almost? o In the general case, it is not possible to ensure that all processes of the extracted graph are correct… n (We can just evaluate the timeliness to know if a process is correct) o However we can evaluate: n if G satisfies o G contains at least all the corrects o We don’t know G[Correct] but… Delporte-Gallet et al SIROCCO'2010 14

Di-cut (directed cut) o In the extracted graph, if there is no link outgoing

Di-cut (directed cut) o In the extracted graph, if there is no link outgoing from p supposed to be timely (e. g. p is a sink), no process can determine if p is correct… In the same way, if all the links from p lead to faulty processes. o (X, Y) is a dicut of G=<S, E> iff (X, Y) is a partition of S such that there is no (directed) link from Y to X Delporte-Gallet et al SIROCCO'2010 15

Almost? o In the general case, it is not possible to ensure that all

Almost? o In the general case, it is not possible to ensure that all nodes of the extracted graph are correct… o However, we can ensure that: n the extracted graph G satisfies o G contains at least all the correct processes o G[Correct] is either G or (Correct, F) is a dicut of G where F is a subset of faulty processes Delporte-Gallet et al SIROCCO'2010 16

Extraction: o Algorithm for extracting a graph from X n Each p has a

Extraction: o Algorithm for extracting a graph from X n Each p has a variable Gp, for all run r there exists G in X: o Convergence: for all correct process, there exists a time t from which Gp=G o Compatibility: G[Correct(r)] is compatible with T(r) o Closure: G[Correct(r)] is a dicut reduction of G (or G itself) Delporte-Gallet et al SIROCCO'2010 17

Some results: o If G is extracted, (p, q) is an edge of G,

Some results: o If G is extracted, (p, q) is an edge of G, and q is correct, then p is correct. o If p 0, …, pm such that p 0 and pm are correct is a path of the extracted graph, then for 0≤i<m, (pi, pi+1) is timely and all pi are correct n (in particular, we obtain a route from p 0 to pm that only contains timely links) o If G is strongly connected, G[correct]=G. Delporte-Gallet et al SIROCCO'2010 18

Main result: o If a family of graph X is closed by dicut reduction

Main result: o If a family of graph X is closed by dicut reduction (for G in X and (A, B) a dicut of G, we have G[A] is in X), then we can always extract a graph from X. o If every graph of X is strongly connected, then the extracted graph G satisfies G[Correct]=G Delporte-Gallet et al SIROCCO'2010 19

Example o In STAR, we extract a star graph whose center is a correct

Example o In STAR, we extract a star graph whose center is a correct process (Ω) o In TREE, we extract a out-tree whose root is a correct process p 0 and such that for all correct process q, there exists a tree-path from p 0 to q that only contains correct processes and timely links o In RING, we extract a ring among all correct processes and containing only timely links o (In contrast, for PAIR, there is no extraction algorithm) Delporte-Gallet et al SIROCCO'2010 20

Principles of the algorithm o Watch and punish o Regularly test (p, q): §

Principles of the algorithm o Watch and punish o Regularly test (p, q): § (p, q) timely q blames (p, q) only a finite number of time o For each (p, q)-blaming, punish all G containing (p, q): increase the counter of G o For each process p, punish all G that does not contain p o (reliably) broadcast the counters o Choose the graph with the smallest counter value n Any graph whose all links are timely and containing all correct in the run is only finitely blamed -> finite counter n Any graph having at least one asynchronous link or that misses some correct will be blamed infinitely often -> infinite counter Delporte-Gallet et al SIROCCO'2010 21

Moreover… o Enhancement: n If there exists a spanning out-tree in all graph of

Moreover… o Enhancement: n If there exists a spanning out-tree in all graph of X, eventually the messages are only sent through the links of the extracted graph n Examples: o STAR, TREE, RING, O(n) links are used (instead of O(n 2)) Delporte-Gallet et al SIROCCO'2010 22

Conclusion and perspectives o Timeliness <-> failures n Timeliness allows to detect failures (the

Conclusion and perspectives o Timeliness <-> failures n Timeliness allows to detect failures (the only way? ) n Timeliness is useful (independently of failures detection) o Algorithm Complexity… o Impossibility results Delporte-Gallet et al SIROCCO'2010 23

Delporte-Gallet et al SIROCCO'2010 24

Delporte-Gallet et al SIROCCO'2010 24