Distributed deadlock May occur due to faulty design

Distributed deadlock May occur due to faulty design or resource sharing problems [Sometimes prevention is more expensive than detection and recovery. So certain designs deliberately do not care about deadlocks, particularly if it is rare. ] Sometimes failures or perturbations can modigy the system state and cause deadlock. Major issues detection prevention recovery

Wait-for Graph (WFG) • Represents who waits for whom. • No single process can see the WFG. • Review how the WFG is formed.

Another classification • Resource deadlock [R 1 AND R 2 AND R 3 …] also known as AND deadlock • Communication deadlock [R 1 OR R 2 OR R 3 …] also known as OR deadlock

Detection of resource deadlock Notations 2 1 w(j) = true (j is waiting) depend [j, i] = true j succn(i) (n>0) P(i, s, k) 3 is a probe (i=initiator, s= sender, r=receiver) P(4, 4, 3) initiator 4

Detection of resource deadlock Chandy-Misra-Haas algorithm {Program for process k} do P(i, s, k) received w[k] (k ≠ i) ¬ depend[k, i] send P(i, k, j) to each successor j; depend[k, i]: = true � P(i, s, k) received w[k] (k = i) process k is deadlocked od

Observations To detect deadlock, the initiator must be in a cycle Message complexity = O(|E|) (edge-chasing algorithm) E=set of edges

Communication deadlock 5 The subgraph of the WFG consisting of black nodes and black edges has a urce deadlock as well as a communication deadlock. However, if we add node 5 an the red edge (4, 5) then the communication deadlock will disappear.

Detection of communication deadlock A process ignores a probe, if it is not waiting for any process. Otherwise, • first probe mark the sender as parent; forwards the probe to successors • Not the first probe Send ack to that sender • ack received from every successor send ack to the parent Communication deadlock is detected if the initiator receives ack. Has many similarities with Dijkstra-Scholten’s termination detection algorithm

Graph Algorithms Why graph algorithms ? It is not a “graph theory” course! Many problems in networks can be modeled as graph problems. Note that - The topology of a distributed system is a graph. - Routing table computation uses the shortest path algorithm - Efficient broadcasting uses a spanning tree - Maxflow algorithm determines the maximum flow between a pair of nodes in a graph.

Routing • • • Shortest path routing Distance vector routing Link state routing Routing in sensor networks Routing in peer-to-peer networks

Internet routing Autonomous System AS 0 Each AS is under a common administration Autonomous System AS 1 Autonomous System AS 2 AS 3 Intra-AS vs. Inter-AS routing Shortest Path First routing algorithm is the basis for OSPF

Routing revisited (borrowed from cisco documentation http: //www. cisco. com)

Routing: Shortest Path Most shortest path algorithms are adaptations of the classic Bellman. Ford algorithm. Computes shortest path if there are no cycle of negative weight. Let D(j) = shortest distance of j from initiator 0. Thus D(0) = 0 0 (w(0, j)+w(j, k)), j k w(0, m), 0 m j The edge weights can represent latency or distance or some other appropriate parameter like power. Classical algorithms: Bellman-Ford, Dijkstra’s algorithm are found in most algorithm books. What is the difference between an (ordinary) graph algorithm and a distributed graph algorithm?

Shortest path Revisiting Bellman Ford : basic idea Consider a static topology Process 0 sends w(0, i), 0 to neighbor i {program for process i} do message = (S, k) S < D(i) --> if parent ≠ k parent : = k fi; D(i) : = S; send (D(i)+w(i, j), i) to each neighbor j ≠ parent; � message (S, k) S ≥ D(i) --> skip od Computes the shortest distance to all nodes from an initiator node The parent pointers help the packets navigate to the initiator

Shortest path Chandy & Misra’s algorithm : basic idea Consider a static topology 0 Process 0 sends w(0, i), 0 to neighbor i {for process i > 0} do message = (S , k) S < D 2 1 2 4 if parent ≠ k send ack to parent fi; deficit : = deficit + |N(i)| -1 � sender � ack deficit : = deficit – 1 � deficit = 0 parent i send ack to parent 2 4 parent : = k; D : = S; send (D + w(i, j), i) to each neighbor j ≠ parent; 1 7 6 5 3 7 2 6 3 Combines shortest path computation with termination detection. Terminati is detected when the initiator receive ack from each neighbor

Shortest path An important issue is: how well do such algorithms perform when the topology changes? No real network is static! Let us examine distance vector routing that is adaptation of the shortest path algorithm

Distance Vector Routing Distance Vector D for each node i contains N elements D[i, 0], D[i, 1], D[i, 2] … Initialize these to 1 1 {Here, D[i, j] defines its distance from node i to node j. } 1 1 - Each node j periodically sends its distance vector to its immediate neighbors. - Every neighbor i of j, after receiving the broadcasts from its neighbors, updates its distance vector as follows: k≠ i: D[i, k] = mink(w[i, j] + D[j, k] ) Used in RIP, IGRP etc

Counting to infinity Observe what can happen when the link (2, 3) fails. Node 1 thinks d(1, 3) = 2 Node 2 thinks d(2, 3) = d(1, 3)+1 = 3 Node 1 thinks d(1, 3) = d(2, 3)+1 = 4 and so on. So it will take forever for the distances to stabilize. A partial remedy is the split horizon method that will prevent 1 from sending the advertisement about d(1, 3) to 2 since its first hop is node 2 k≠ i: D[i, k] = mink(w[i, j] + D[j, k] Suitable for smaller networks. Larger volume of data is disseminated, but to its immediate neighbors only Poor convergence property

Link State Routing Each node i periodically broadcasts the weights of all edges (i, j) incident on it (this is the link state) to all its neighbors. The mechanism for dissemination is flooding. This helps each node eventually compute the topology of the network, and independently determine the shortest path to any destination Smaller volume data disseminated over the entire network node. Used in OSPF

Link State Routing • Each link state packet has a sequence number seq that determines the order in which the packets were generated. • When a node crashes, all packets stored in it are lost. After it is repaired, new packets start with seq = 0. So these new packets may be discarded in favor of the old packets! • Problem resolved using TTL