Chapter 5 Tree Constructions BreadthFirst Search BFS layerbased

Chapter 5: Tree Constructions • Breadth-First Search (BFS) – layer-based using Dijkstra’s algorithm – update-based using the Bellman-Ford algorithm • Distributed Depth-First Search (DFS) • Minimum spanning trees (MST) • Matroid Problems – solutions using synchronous upcasting yield faster alternatives to MST

• Breadth First Search tree construction – recall that the flooding algorithm can be used to construct a BFS tree for a synchronous model – lower bounds for BFS algorithms • message complexity = Ω(|edges|) • time complexity = Ω(diameter) – flood algorithm may not construct a BFS tree for the asynchronous model – disallow large messages so we can focus on time/message tradeoffs

• Layer-synchronized BFS construction (Dijkstra’s algorithm) – build tree layer by layer – at each stage, add all vertices which are adjacent to a vertex from a previously-constructed layer – r 0 initiates construction by issuing phases with one new layer constructed at each phase – at each phase p + 1, assume the tree has already been constructed on p layers (denoted by Tp); then do the following: • r 0 generates a pulse message and broadcasts it on Tp • each vertex v in Tp upon receiving pulse sends an exploration message Ex to all of its neighbors (except its parent) • for each vertex w, upon receiving Ex for the first time picks one neighbor, v, to be its parent (parent(w) = v) and sends Ack to this parent • if vertex w has already selected a parent: upon receipt of an Ex message, it replies with Ack as well as parent(w) • each leaf v of Tp collects Ack messages on its Ex messages; if an Ack from vertex w with parent(w) = v arrives, v adds w to its set: child(v)

• Dijkstra’s algorithm (cont’d. ): – algorithm description (cont’d. ) • once a leaf v in Tp has received all of its Ack messages, it upcasts Ack to its parent in Tp; these Ack messages are then convergecast on Tp back to r 0 • once this convergecast terminates at r 0, it may begin the next phase – termination detection • each Ack message has a new field (initially set to 0) which will indicate if any new vertices were added to the tree in the current phase • a vertex v sets new(v) = 1 if any new vertices have responded to its Ex message by joining the tree as children • OR of these new bits is convergecast on the tree • if a phase ends with r 0 receiving new(v) = 0 in each Ack message from its children, then the next layer explored by the leaves in the current phase is empty and the tree is complete – inefficiencies: • certain Ex messages can be avoided – if only the left subtree of a node is unexplored, we still send Ex messages to the right subtree as well • some of the Ack messages can be omitted

• Dijkstra’s algorithm (cont’d. ): – complexities • (lemma 5. 2. 1) after phase p is completed, the variables parent and child correspond to a legal BFS tree spanning Γ 0 (r 0) = p-neighborhood of r 0 • time = O(diam 2(G)) • message = O(n * diam(G) + |E|) – analysis • time – time(phase p) = 2 p + 2; broadcast and convergecast take p time units each; exploration takes two time units – for 1 p diam(G), then » p time(phase p) = p 2 p + 2 = O(diam 2(G))

• Dijkstra’s algorithm (cont’d. ) – analysis (cont’d. ) • message – assume p 0; let Vp be the set of vertices in T at layer p; let Ep be the internal edges of Vp; and let Ep, p+1 be the edges connecting Vp to Vp+1 – at phase p, exploration messages are sent only over Ep and Ep, p+1; and the edges of Tp are traversed twice, giving message(phase p) = O(n) + O(|Ep|) + O(| Ep, p+1 |) – for 1 p diam(G), then » p message(phase p) = p O(n) + O(|Ep|) + O(| Ep, p+1|) = = O(n * diam(G) + |E|)

• Update-based BFS construction (distributed Bellman-Ford algorithm) – modified flooding algorithm to ensure that a BFS tree is constructed in the asynchronous model – algorithm: • each vertex keeps a variable L(v) (initially set to ), its distance to the root • as flooding progresses, each vertex v sends L(v) to its neighbor w along with the flooded message • if a vertex w receives L(v) from its neighbor v and L(v) + 1 < L(w), then w chooses v as its parent and sets L(w) = L(v) + 1 • if this change occurs, then w also informs all of its other neighbors of its new (shorter) path to the root

• Bellman-Ford distributed algorithm (cont’d. ) – complexities – time = O(diam(G)) – message = O(n*|edges|) – analysis • synchronous – complexities are the same as in the flooding algorithm; once a vertex changes L(v) from , it won’t change it again • asynchronous – time: » assume d 1 » at d time units into the execution, each vertex v at distance d from the root has already received a L(d-1) message from some neighbor » v will then set L(v) = d and choose a parent w such that L(w) =d– 1 » induction on d gives O(diam(G))

• Bellman-Ford distributed algorithm (cont’d) • asynchronous model analysis (cont’d) – message: » for a vertex v, the first value it assigns to L(v) is at most n-1 (the longest possible path in the network) » L(v) then changes at most n-2 times » each change to L(v) results in v sending messages on each of its outgoing edges » thus each v sends at most n*degree(v) messages » total messages = v n*degree(v) = O(n*|edges|)

• Distributed Depth-First Search – general overview • algorithm – begin at some source vertex, r 0 – when reaching any vertex v » if v has unvisited neighbors, then visit them » otherwise, return to parent(v) – when we reach the parent of some vertex v such that parent(v) = NULL, then we terminate since v = r 0 • DFS defines a tree, with r 0 as the root, which reaches all vertices in the graph – “back edges” = graph edges not in tree – sequential time complexity = O(|edges|)

• Distributed DFS (cont’d. ) – distributed version = token-based • the token traverses the graph in a depth-first manner using the algorithm described above • complexities – message = time = (|edges|) » note that edges are not examined from both endpoints; when edges (v, w) is examined by v, w then knows that v has been visited • analysis – message: » lower bound of (|edges|) to explore every edge

• Distributed DFS (cont’d. ) • analysis (cont’d. ) – time: » ensure that vertices visited for the first time know which of their neighbors have/have not been visited; thus we make no unnecessary vertex explorations » algorithm: freeze the DFS process; inform all neighbors of v that v has been visited; get Ack messages from those neighbors; restart DFS process » additional time cost each time a vertex is first visited = O(1) » only edges of the DFS tree are traversed » therefore, time complexity = O(n)

• Minimum spanning trees (MST) – evaluate the spanning tree by total weight • subgraph: – let G’ be a subgraph of the graph G with a set of edges E’ and weight function w( ); – then w(G’) = e E’ w(e) • then define the MST of a tree T as a spanning tree TM which minimizes w(TM) – MST problem • given a weighted graph G = (V, E, w), compute an MST for G • edges are assumed to be distinct, thus yielding an unique MST for G – if not unique, such weights can be created using vertex identifiers – however in anonymous networks without distinct edge weights or distinct index identifiers, no distributed algorithm exists for computing an MST with a bounded number of messages

• MST (cont’d. ) – in the worst case, distributed MST construction requires • (|E|) messages for weighted n-vertex graphs • (n logn) messages for arbitrary n-vertex graphs – definitions • an MST fragment is a tree T in G where MST TM of G such that T is a subtree of TM – edge e = (v, w) is an outgoing edge of fragment T if either v or w (but not both) belongs to T – MWOE(T) = minimum weight outgoing edge of fragment T • blue rule: – given fragment T and e = MWOE(T) create T’ = T {e} • lemma 5. 5. 6 - T’ is a fragment as well

• MST (cont’d. ) – Prim’s algorithm (distributed version) • works by repeatedly applying the blue rule to each resulting T’ and each resulting e’ = MWOE(T’), as above, to yield the MST for G • works with both asynchronous and synchronous models • algorithm – let vertex r 0 be the source as well as first fragment T – use pulse messages broadcast on the current fragment T to synchronously add the MWOE(T) – each vertex in T sends its MWOE – convergecast the MWOE’s (each vertex sends the minimum it has seen) towards r 0 – the MWOE is then selected by r 0 and broadcast on the tree • complexities – time = message = O(n 2)

• MST (cont’d. ) – synchronous GHS algorithm • Prim’s algorithm is still fairly sequential • GHS (distributed version of Kruskal’s algorithm) is less sequential and thus more efficient • Kruskal’s algorithm – each vertex v is initially a fragment – at each step, the MWOE of all fragments is selected and added to the tree, thus merging the two fragments it touches – when a single fragment remains, it is the MST for T * sequential – n-1 steps still needed

• MST (cont’d. ) • GHS algorithm overview – works with synchronous model – vertices are partitioned into fragments, with each fragment Fi being a rooted tree – each fragment has an identifier (possibly the identifier of its root) – each vertex in a fragment knows its parent, children and the identifier of the fragment – works in phases, each with input of the fragment structure from the previous phase and output of larger fragments • description of each phase: – all vertices of a fragment F cooperate to find the MWOE(F) – carried out as in Prim; it is assumed that each vertex knows which of its edges is outgoing – a Request_to_merge message is sent over e = MWOE(F) to fragment F’, carrying F’s identifier – the two fragments then combine (possibly with several other fragments if MWOE(F’) MWOE(F)) into a larger fragment

• MST (cont’d. ) • description of each phase (cont’d. ) – once connected, the two fragments (now one) proceed as follows » assume fragments F 1 and F 2, where e = MWOE(F 1) = MWOE(F 2) » assume e = (v 1, v 2) where v 1 F 1 and v 2 F 2 » the root of the new fragment is chosen as the higher identifier of the two vertices v 1 and v 2, say v 1 » the new root, v 1, broadcasts a New_fragment message throughout the combined fragment F’ informing all vertices of its identifier (the new identifier of F’) » each vertex updates its identifier and root entries and the direction of its fragment edges to point to its new parent (the vertex which sent the message) – thus now “pointing” towards the new root of F’ » each vertex then updates its neighbors of its fragment identifier

• MST (cont’d. ) • complexities – message = O(|E| log n) – time = O(n log n)