Graph Algorithms By Shruti Aggrawal Preeti M Palkar

Graph Algorithms By: Shruti Aggrawal, Preeti M. Palkar

Table Of Contents p Graphs n n Euler’s Circuit Problem Graph Terminologies (Paths, Cycles, Subgraph, Connectivity, Spanning Tree, Forest) Depth First Search p Breadth First Search p Minimum Cost Spanning Tree p n Kruskal’s Algorithm

Graphs Used to model relationships between objects. p Graphs consist of nodes and edges. p The edges may or may not have direction. p If they don’t have a direction they are called as undirected graphs. p If they have a direction they are called as directed graphs. p

Examples of modeling with Graphs Graph of a road network (Undirected Graph) (Directed Graph)

Euler’s Circuit problem Problem: Whether it was possible to start walking from anywhere in town and return to the starting point by crossing all bridges exactly once. This is an undirected graph so one can walk in any direction. In this case edges can’t repeat but vertices can. Degree of node = Number of Edges connected to it. Example: Degree of C = 5. In the Euler’s circuit problem, if you enter a node, you have to exit it too, in order to cover all paths (edges). Hence all the nodes should have an even degree. In the above case, the nodes do not have even degree hence it is not possible to start walking from anywhere in town and return to the starting point by crossing all bridges exactly once. For a graph to be Euler’s circuit all nodes’ degrees should be even.

Graph Terminologies p p p A Graph consists of a set 'V' of vertices (or nodes) and a set 'E' of edges (or links). A graph can be directed or undirected. Edges in a directed graph are ordered pairs. The order between the two vertices is important. Example: (S, P) is an ordered pair because the edge starts at S and terminates at P. The edge is unidirectional Edges of an undirected graph form unordered pairs. A multigraph is a graph with possibly several edges between the same pair of vertices. Graphs that are not multigraphs are called simple graphs.

Graphs Terminologies

Graphs Terminologies p p p The degree d(v) of a vertex v is the number of edges incident to v. d (A) = three, d (D) = two In directed graphs, indegree is the number of incoming edges at the vertex and outdegree is the number of outgoing edges from the vertex. The indegree of P is 2, its outdegree is 1. The indegree of Q is 1, its outdegree is 1.

Paths and Cycles

Paths and Cycles It is similar to the traveling salesman problem.

Subgraphs

Spanning tree A spanning tree of a graph G is a subgraph of G that is a tree and contains all the vertices of G.

Connectivity p p p A graph is said to be connected if there is a path from any vertex to any other vertex in the graph. A forest is a graph that does not contain a cycle. A tree is a connected forest. A spanning forest of an undirected graph G is a subgraph of G that is a forest and contains all the vertices of G. If a graph G(V, E) is not connected, then it can be partitioned in a unique way into a set of connected subgraphs called connected components. A connected component of G is a connected subgraph of G such that no other connected subgraph of G contains it.

Forest

Graphs Representations in undirected graph Matrix: p ‘ 1’ indicates that a link is present between the nodes. p In case of an undirected graph we get a symmetrical matrix. p Edges are counted twice. p For a graph of n nodes, size of matrix is n 2. p Maximum number of edges = n(n-1)/2, because n nodes are connected to n-1 nodes and counting each link once so divide by 2.

Graphs Representations in directed graph List: p The ‘ 0’s from the matrix representation are eliminated. p It has varying row length.

Depth First Search (DFS) p The algorithm starts from a node, selects a node connected to it, then selects a node connected to this new node and so on, till no new nodes remain. The it backtracks to the latest node and discovers any new nodes connected to it. p The data structure suitable for this purpose (LIFO) is a stack. p Once stack is empty the algorithm ends.

DFS is an aggressive algorithm p It finds a spanning tree p Produces automatic ordering. p Ordering –not unique p DFS is written recursively as it uses stack p Application-connected components p

Algorithm for DFS

Implementing DFS p p p p To draw a depth first search spanning tree lets start with A. Put A in the stack. Next we can pick either of D, E, B. Order does not matter. In this example we r picking B. Put B in stack. After B we can either pick C or E. In this case we put C in the stack. D is selected next and put in stack. After D there is just one edge to A. As A is already visited, we start back track. We pop out D and look at C. From C there is one node left which is E. Add E to stack. From E again there is no other node so we pop E out and look at C again. As there is no other node left which needs to be explored so we pop C out and similarly pop B and A out. Finally as the stack is empty algorithm comes to an end and we have corresponding tree.

Running time of DFS p p p p |V | = vertex set = n |E | = edges set = O(n 2) Running time in case of Adjacency List representation = O(V+E) We need to add V because there can be isolated nodes. Hence this is a linear time algorithm. If graph is very dense we can use matrix representation. Running time in case of Adjacency Matrix representation = O(n 2)

Breadth First Search (BFS) Compared to DFS breadth first search is a simple algorithm p Timidly tries one edge. Totally exhaust neighbors of a vertex then goes to next neighbors. It radiates in waves in balanced manner. p Implemented using queues p Whatever is in queues tells u what to explore next. Once the queue is empty algorithm comes to an end p

Algorithm for BFS Procedure BFS_Tree G(V, E) Input: G = (V, E); Q is a queue - initially empty; x ←Q : remove the front item of queue and denote it by x; initially mark all vertices ’new’; L[x] refers to the adjacency list of x. T ← {0} Output: The BFS tree T; 1. v ←old; v∈ V 2. insert (Q, v); 3. while Q is nonempty do 4. x←Q 5. for each vertex w in L[x] and marked ’new’ 6. T ← T ∪ {x, w} ; 7. w ← old; 8. insert (Q, w);

BFS

BFS

Implementing BFS using Queues p Start from A. Add it to queue p When visited A is removed from the queue, Add D, E, B in queue (order does not matter) C B B E D A E A D C E B p Next when D is pulled out vertex C which is adjacent to D is added

Running time of BFS G=(V, E) , |V|=n, |E|=m p Running time of BFS=O (n + m) p There is no cost to insert or delete p When a vertex is inserted or deleted it is always at the back of the queue p

Application of BFS – Shortest Path p A B C p D E F Figure 1 p A B C E Figure 2 p D F Using BFS –Shortest path to all the vertices To find shortest path from A to vertices B, C, D, E, F we first create a spanning tree and from that we can find shortest path Figure 1 shows the graph and figure 2 shows the corresponding spanning tree. From A we can see the shortest path to reach E is through A to C to E

DFS Versus BFS Graph p p p DFS BFS DFS trees usually have thin and tall structure. In case of complete graph (dense) it has one straight line all the way. BFS trees usually are bushy like a star. In both trees, number of vertices is same. Both are following the edges and hence can be used to find connected components. DFS can be recursively written. It is difficult to do so in case of BFS.

p DFS and BFS break graph in 3 parts n n Visited n visited : nodes that have been visited. Fringe: nodes that it will consider to visit next. Remaining vertices For DFS, the fringe nodes are in the stack while for BFS they are in queue. Fringed Remaining DFS and BFS each create a different shape. p BFS: p DFS:

MCST p p p MCST is minimum cost spanning tree. The minimum-cost spanning tree (MCST) is one whose edge weights add up to the least among all the spanning trees Minimum cost Spanning tree is a fixed connected subgraph, containing all the vertices such that the sum of the costs of the edges in the subgraph is minimum also it does not contain any cycles. A given graph may have more than one spanning tree

MCST Consider a network of computers connected through bidirectional links. Each link is associated with a positive cost: the cost of sending a message on each link. The network is represented as an undirected graph with positive costs on each edge. In bidirectional networks we can assume that the cost of sending a message on link does not depend on the direction. Suppose we want to broadcast a message to all the computers from an arbitrary computer. The cost of the broadcast is the sum of the costs of links used to forward the message. A A 3 3 3 B 1 C 6 D 3 B 3 1 C 2 1 2 2 B 4 A 3 1 2 C D D 4 5 5 7 5 4 E An undirected graph with positive costs on each edge. F 4 7 E E 2 5 8 F F Graph with corresponding sorted order written beside the edges Minimum cost spanning tree

MCST p p Why does the least cost edge have to be there in MCST? Proof using contradiction n x y n n U v n n p Assume minimum cost edge is not in the MCST. Assume it is between u and v(the dotted line shown in the graph). Since this edge is not in the MCST, there exist another path between u and v. Remove one edge (between x and y) and use the minimum cost edge. Now you get a cheaper tree. Hence least cost edge has to be there in MCST.

Kruskal’s Algorithm Graph is given, find MCST

Kruskal’s Algorithm A 3 3 1 B 4 2 2 B 2 1 6 A 3 C D 4 1 6 C 3 5 4 7 5 E 8 2 D 4 F 7 5 E 8 F p Figure on the left shows a graph with sorted order (inside text boxes) , cost written beside the edges. Initially graph has only vertices. Starting from A we can see A to C has order one so this edge will be added first to spanning tree p Edge AD and AB are next as their order is 2 and 3 respectively p Edge between C and D has order 4 but it creates a cycle between ACD therefore it is not added to the tree p Next edge between D and F is added and edge between B and C is dropped as it creates cycle between A, B, C. Lastly edge between C, E is added and edge between E, F is dropped p Edges that do not belong to the tree are the heavier ones than those that are in the tree.

Kruskal’s Algorithm p Running time: n n |V|=n , |E|=m Running time of sorting =O (m log n)

Example Why two already visited vertices are not sufficient to check for a cycle A A 4 3 1 B 4 3 2 B 1 7 4 C 3 D 3 2 1 1 C D 5 6 8 5 3 6 4 E 2 1 F p Figure on the left shows a graph with sorted order (inside text boxes) written beside the edges plus the cost corresponding to each edge. Starting from A we can see A to C has order one so this edge will be added first to spanning tree p Second in order is an edge between E and F so we add it. p Next we add edge between A and D and one edge between A and B. Edge between C and D is discarded as it creates a cycle p Next an edge between D and F is added even if D and F have been explored. Edge between B and C is dropped and edge between C and E is dropped

Example When an edge is added when its not.