Graphs Motivating Problem n Konigsberg bridge problem 1736

Graphs

Motivating Problem n Konigsberg bridge problem (1736): Starting in one land area, is it possible to walk across all bridges once and return to the initial land area? C d c e A a g b D f B The answer is no, but how do we prove that?

Konigsberg = Graph Problem n The Konigsberg is an instance of a graph problem n Definition of a graph: Graph G: Consists of two sets, V and E n V: A finite, non-empty set of vertices n E: A set of pairs of vertices, where the pairs are called edges. n

Example Graphs V: 0, 1, 2, 3 E: (0, 1), (0, 2), (0, 3) (1, 2), (1, 3), (2, 3) 0 1 V: 0, 1, 2, 3 E: Empty 0 1 2 3 2 0 3 V: 0, 1, 2, 3 1 2 3 E: (0, 1), (0, 2), (1, 3) Trees are a subset of graphs

Graph Definitions n Undirected graph: Pair of vertices representing any edge is unordered n n (u, v) is the same edge as (v, u) Directed graph: Each edge is represented by a directed pair (u, v) Drawn with an arrow from u to v indicating the direction of the edge n (u, v) is not the same edge as (v, u) n

Directed vs. Undirected 0 0 1 2 3 Graph A: Undirected (1, 0) is a valid edge Same as (0, 1) edge 1 0 2 3 Graph B: Directed Not equivalent to A (1, 0) not valid edge 1 2 3 Graph C: Directed Equivalent to A (1, 0) is a valid edge A (0, 1) edge also exists

Graph Restrictions n For now lets assume vertices and edges are sets No self edges (vertice back to itself) n No repeated edges (multigraph) n Sel f 0 1 2 Repeated 3

Motivating Problem: Graph Restrictions n Konigsberg bridge problem (1736): This is the appropriate graph representation. 0 1 2 3 We’re not going to solve it for now because of our assumption of no repeated edges.

Graph Definitions Maximum possible number of distinct unordered pairs (u, v) (undirected graph) in a graph with n vertices is n*(n-1) / 2. n A graph with this many edges is called a complete graph. n Complete: 6 edges = (4 * 3) / 2 0 1 2 3 0 1 Not Complete: (4*3)/2 != 4 edges 2 3

Graph Definitions n Directed Graph: n Maximum of (n * (n-1)) edges. [Twice that for undirected because 2 directed are equivalent to one undirected] Proof of (n * (n – 1)) bounds: n nodes, can point to every other node except for themselves n-1 edges connecting to each of the n nodes

Graph Definitions n If (u, v) is an edge in E(G), Vertices u and v are called adjacent n The edge (u, v) is called incident on vertices u and v. n n Examples: 0 1 2 3 Vertex 0 is adjacent to 1 and 2 Vertex 1 is adjacent to 0, 2, and 3 Vertex 2 is adjacent to 0 and 1 Vertex 3 is adjacent to 1 Edges incident on vertex 2: (0, 2), (1, 2) Edges incident on vertex 3: (1, 3)

Graph Definitions n A subgraph of graph G called G’ is a graph such that V(G’) is a subset V(G) and E(G’) is a subset of E(G) Subgraphs 0 1 2 1 2 3 0 3 1 2

Graph Definitions A path from vertex u to vertex v in a graph G is a sequence of vertices u, i 1, i 2, …, ik, v, such that (u, i 1), (i 1, i 2)…(ik, v) are edges in G. n If in a directed graph, the edges have to be in the right direction. n The length of a path is the number of edges on the path. n A simple path is a path in which all vertices except possibly the first and last are distinct. n

Graph Definitions 0 1 2 3 Paths from 1 to 3: (1, 3) Length = 1 (1, 2), (2, 3) Length = 2 (1, 0), (0, 2), (2, 1), (1, 3) Length = 4 (1, 2), (2, 0), (0, 1), (1, 3) Length = 4 (1, 0), (0, 2), (2, 3) Length = 3 1, 3 [Simple] 1, 2, 3 [Simple] 1, 0, 2, 1, 3 1, 2, 0, 1, 3 1, 0, 2, 3 [Sim Many more that repeat internally, Not simple, Length > 4

Graph Definitions n A cycle is a simple path where the first and last vertices are the same. 0 1 2 3 Cycles to 1: 1, 0, 2, 3, 1 1, 0, 2, 1 1, 3, 2, 1 1, 0, 1 1, 3, 1

Graph Definitions Two vertices u and v are connected if there is a path in G from u to v. n An undirected graph is said to be connected (at the graph level) if and only if for every pair of distinct vertices u and v in V(G) there is a path from u to v in G. n A connected component of a graph is a maximal connected subgraph n

Graph Definitions 0 1 4 2 3 5 6 7 Graph G 4: V(G 4): 0, 1, 2, 3, 4, 5, 6, 7 E(G 4): (0, 1), (0, 2), (1, 3), (2, 3), (4, 5), (5, 6), (6, 7) There are two connected components of G 4: H 1 (0 -3) and H 2 (4 -7) Verify that H 1 and H 2 components are connected: Path between all pairs of vertices Directed graphs – different because paths are directed, harder to get connected components

Graph Definitions n A tree is a connected, acyclic graph For any node there is path to any other node (usually back through a “parent node”) n Acyclic property forces a unique path between nodes n 0 1 2 4 3

Graph Definitions Need a corollary to “connected” for directed graphs n A directed graph G is strongly connected if for every pair of distinct vertices in V(G), u and v, there is a directed path from u to v and from v to u. n A strongly connected component is a maximal subgraph of a directed graph that is strongly connected n

Graph Definitions 0 1 2 Graph G 5: V(G 5): 0, 1, 2 E(G 5): (0, 1), (1, 0), (1, 2) G 5 is not strongly connected (No path from 2 to 1) There are two strongly connected components of G 4: H 1 (0 -1) and H 2 (2 Verify that H 1 and H 2 components are strongly connected: Directed path between all pairs of vertices

Graph Definitions The degree of a vertex v is the number of edges incident to that vertex. n For a directed graph, n The in-degree of a vertex v is the number of edges for which the arrow points at v. n The out-degree is defined as the number of edges for which the arrow points away from v. n

Graph Definitions 0 1 2 Degree of Vertices: 0, 3 => Degree 2 1, 2 => Degree 3 3 0 1 2 0 Out-Degree : 1 In-Degree: 0 1 Out-Degree : 1 In-Degree: 0 2 Out-Degree : 0 In-Degree: 2

Graph Definitions n For undirected graph whose vertices v_i have degree d_i, the number of edges, |E|, is (the sum from 0 to n 1 of degree_i ) / 2 n Essentially just counting the edges. n n Divide by 2 because double counting (if node 1, node 2 share an edge, that edge is in both of their degrees) Useful for computing max number of edges if you only know number of vertices and their degree (ie they are all binary => |E| = 2 * |V| / 2 = |V|) n Given our graph assumptions (no repeated edges, no self edges) degree has to be <= |V|-1

Graph Definitions 0 1 2 3 0 1 |E| 2 Max = ? , All Binary Degree of Vertices: 0, 3 => Degree 2 1, 2 => Degree 3 |E| = Sum of Degrees / 2 = (2+2+3+3)/2 = 10/2 = 5 {Correct!} |E| max = Sum of Degrees / 2 = (2 + 2) / 2 = 6/2 = 3 {Correct!}

Graph Representations n What core functionality do we need in representation? Set of vertices n Set of edges n n Two major representations: Adjacency matrix [Array based] n Adjacency list [Linked List based] n

Adjacency Matrix n n G = (V, E) graph with |V| = n, n >= 1 Adjacency matrix: n 2 dimensional n x n array called A with property that A[i][j] = 1 if the edge (i, j) is in E, 0 otherwise. 0 1 2 3 V 0 V 1 V 2 V 3 V 0 0 1 1 0 V 1 1 0 0 1 V 2 1 0 0 1 V 3 0 1 1 0

Adjacency Matrix n Directed Graphs: Rows are the out indicators (data in a row indicates that there is an outgoing link) 0 1 2 3 V 0 V 1 V 2 V 3 V 0 0 1 0 0 V 2 1 0 0 1 V 3 0 1 0 0

Adjacency Matrix n Note how the adjacency matrix for an undirected graph is symmetric. Can save approximately half the space by storing only upper triangle or lower triangle 0 1 2 3 V 0 V 1 V 2 V 3 V 0 0 1 1 0 V 1 1 0 0 1 V 2 1 0 0 1 V 3 0 1 1 0

Adjacency Matrix Given a complete adjacency matrix, can easily: Determine if there is an edge between any two vertices (look in appropriate column) [Undirected Graph] Compute degree of a node (sum over row) 0 1 2 3 V 0 V 1 V 2 V 3 V 0 0 1 1 0 V 1 1 0 0 1 V 2 1 0 0 1 V 3 0 1 1 0

Adjacency Matrix Given a complete adjacency matrix, can easily: [Directed Graph] Compute out degree of a node (sum over row) Compute in degree of a node (sum over column) 0 1 2 3 V 0 V 1 V 2 V 3 V 0 0 1 0 0 V 2 1 0 0 1 V 3 0 1 0 0

Adjacency Matrix n What if we want to compute a non-trivial answer? How many total edges are there in the graph? n Is the graph connected? n Total edges: Requires O(n^2) operations n n (n^2 entries – n [diagonals always 0]) 0 1 2 3 V 0 V 1 V 2 V 3 V 0 0 1 1 0 V 1 1 0 0 1 V 2 1 0 0 1 V 3 0 1 1 0

Adjacency Matrix n What if we have a sparse graph? Sparse = Very few connections out of all possible n V 0 V 1 V 2 4 V 3 0 5 1 6 2 V 5 7 3 V 4 V 6 V 7 V 0 0 1 1 0 0 0 VV 1 2 1 1 0 0 0 V 3 0 1 1 0 0 0 V 4 0 0 0 1 0 0 V 5 0 0 1 0 V 6 0 0 0 1 V 7 0 0 0 1 0

Adjacency Matrix n n n Would really like to do O(|E|) operations when counting edges O(n^2) is a given when using adjacency matrix For dense graphs, |E| is close to n^2 Not for sparse graphs (|E| << n^2) Solution: Use linked lists and store only those edges that are really represented in the graph (no 0’s for things that aren’t present). n Slightly more complicated to implement but saves a lot of time

Adjacency List N rows of adjacency matrix (vertices) are represented as n linked lists. n Nodes in list i are those nodes adjacent to the corresponding vertex v_i. n Array of Head Node Pointers 0 1 2 3 0 1 2 1 3 0 2 0 3 3 1 2 Order in linked list doesn’t matter

Adjacency List Undirected Graph: n vertices, e edges – Requires n head nodes, 2 * e list nodes n For any vertex, computing degree (number of incident edges) is counting size of corresponding list. n Number of edges for whole graph is computed in O(n + e) << O(n^2) n

Adjacency List for Directed Graph N rows of adjacency matrix (vertices) are represented as n linked lists. n Nodes in list i are those nodes that one can reach from leaving the corresponding vertex n Array of Head Node Pointers 0 1 2 0 2 1 3 0 2 3 3 2 Order in linked list doesn’t matter

Adjacency List n For a directed graph, nodes in a list are those that you can reach leaving from the corresponding vertex Computing out degree for any vertex – Count number of nodes in corresponding list n Computing number of total edges in graph: Adding all outdegrees => O(n + e) [visit all head nodes and all items in lists] << O(n^2) n

Adjacency List n For directed graphs, this approach isn’t very useful for determining in-degree n n This was trivial in an adjacency matrix (sum down a column) You can build the inverse adjacency list at the same time building adjacency Array list. of Head Node Pointers 0 0 1 2 3 1 1 2 0 3 1 3

Primitive Graph Operations n In essence, can just think of a graph as a container, holding edges and vertices (with a lot of special properties): class Graph { public: Graph(); // create an empty graph void Insert. Vertex(Vertex v); // Insert v into graph with no incident edges void Insert. Edge(Vertex u, Vertex v); // Insert edge (u, v) into graph void Delete. Vertex(Vertex v); // Delete v and all edges incident to it void Delete. Edge(Vertex u, Vertex v); Delete edge (u, v) from graph bool Is. Empty(); // if graph has no vertices return true List<Vertex> Adjacent(Vertex v); // return list of all vertices adjacent to vertex v }

Adjacency Matrix/List Construction n Note we will be updating these data structures as we call our graph class methods: Insert. Vertex(Vertex v) n Insert. Edge(Vertex u, Vertex v) n Delete. Vertex(Vertex v) n Delete. Edge(Vertex u, Vertex v) n n Linked list approach likely more useful over arrays if don’t know beforehand what adding to graph (how many edges, etc).

Weighted Edges Often see weighted edges on graphs n Common uses: n Distance between vertices n Cost of moving from one vertex to another vertex n Think of vertices as cities – could be real distances or flight costs and want to know shortest/cheapest path n n How do we incorporate these weights into our graph representations?

Weighted Edges Mehran 1 Could store weights in adjacency matrix (just need a non-zero entry) Ilam 60 0 40 2 55 50 3 Ivan Sarableh 0 1 2 3 0 0 60 40 55 1 60 0 2 40 0 0 50 3 55 0 50 0 If using lists, add another field to node

Elementary Graph Operations n First Operation: Traversal of graphs Already saw how this worked in binary trees (inorder, preorder, postorder, depth-order) n Similar idea for general graphs n n Given a graph G = (V, E) and a vertex v in V(G), visit all vertices in G that are reachable from v n This is the subset of the graph that is connected to v.

Graph Traversal n Depth-First Search Similar to descending down binary tree n Basic algorithm: n n Begin by starting at vertex v. n Select an edge from v, say it’s the edge (v, w). n Move to vertex w and recurse. n When a node has been visited that has all of its adjacent vertices visited, back up to the last node with an unvisited adjacent vertex and recurse.

DFS Traversal void Graph: : DFS() // driver { visited = new bool[n]; for (int i = 0; i < n; i++) visited[i] = false; DFS(0); delete [] visited; } void Graph: : DFS(const int v) // workhorse { visited[v] = true; for (each vertex w adjacent to v) //use adjacency matrix or lists if (!visited[w]) DFS(w); }

DFS Traversal – Graph To Traverse 0 1 3 2 4 5 7 6 0 1 2 1 0 3 4 2 0 5 6 3 1 7 4 1 7 5 2 7 6 2 7 7 3 4 5 6

DFS Traversal Order of Traversal: 0 0, 1, 3 7, 4, 5, 2, 6 1 3 2 4 5 7 6 Note that performing DFS can find connected components. In this case, the whole graph is connected and thus all nodes were visited.

Analysis of DFS – From a single node n Running time dependent on graph structure and representation n If graph G is represented by adjacency lists n Determine vertices adjacent to vertex v by following chain of links. n Each node in each lists is visited at most once, and there are 2*e list nodes. Thus, the running time is bounded by the number of edges. n If graph G is represented by an adjacency matrix n Have to look at n items (n = number of vertices) for each vertex (scanning down each row in the matrix) n Running time is O(n*n).

Breadth First Traversal Similar to descending across levels of a binary tree. n Visit the starting vertex v. n Visit all unvisited vertices directly adjacent to v. n Recurse, visiting all unvisited vertices directly adjacent to those from the previous step. n

BFS Traversal Algorithm void Graph: : BFS(int v) { visited = new bool[n]; for (int i = 0; i < n; i++) visited[i] = false; visited[v] = true; Queue<int> q; q. insert(v); while (!q. is. Empty()) { v = *q. Delete(v); for (all vertices w adjacent to v) if (!visited[w]) { q. insert(w); visited[w] = true; } } delete [] visited; }

BFS Traversal – Graph to Traverse 0 1 3 2 4 5 7 6 0 1 2 1 0 3 4 2 0 5 6 3 1 7 4 1 7 5 2 7 6 2 7 7 3 4 5 6

BFS Traversal Order of Traversal: 0 0, 1, 2, 3, 4, 5, 6, 7 1 3 2 4 5 7 6 Note that performing BFS can find connected components. In this case, the whole graph is connected and thus all nodes were visited.

Analysis of BFS n n Each visited vertex enters queue exactly once (n vertices). Once in the queue, have to review list of neighbors. n n n For adjacency matrix, that list is n items long, meaning the total time is O(n*n) For adjacency list, that list has degree(vertex) items, and the sum of the degrees for all n vertices is O(e), so total cost is bounded by number of edges Same cost as DFS

Connected Components n How do we find all connected components? n Calling BFS or DFS will find a connected component n Those vertices connected to the start node n To find all connected components, n Select a start vertex, call DFS n Select another start vertex which hasn’t been visited by a previous DFS, call DFS n Repeat until all vertices have been visited.

Connected Components Algorithm void Graph: : Components() { visited = new bool[n]; for (int i = 0; i < n; i++) visited = false; for (i = 0; i < n; i++) { if (!visited[i]) { DFS(i); output. Component(); } } delete [] visited; }

Analysis of Connected Components n Using adjacency lists, n Any call to DFS is O(e’) where e’ is the set of edges present in the particular connect component the start node is in. n n For loop itself takes O(n) time and calls DFS for nonvisited nodes n n n Sum over e’ has to equal |E|, the total number of edges DFS not called every time as mark many nodes with each DFS Total time: O(n+e) With adjacency matrix, O(n*n) [have to look at all columns for all rows]

Spanning Trees n If a graph G is connected, a DFS or BFS starting at any vertex visits all vertices in G. n There a set of particular edges that are traversed during this process. n Let T be the set of edges that are traversed, and N be the remaining edges.

BFS Traversal Order of Traversal: 0, 1, 2, 3, 4, 5, 6, 7 0 1 3 Edges traversed [T]: (0, 1), (0, 2), (1, 3), (1, 4), (2, 5), (2, 6) (3, 7) 2 4 5 7 6 Not traversed[N]: (4, 7), (5, 7), (6, 7)

A Spanning Tree Algorithm Not hard to record traversed edges in BFS (or DFS) – add to a dynamic list whenever encounter new edge. void Graph: : BFS(int v) { visited = new bool[n]; Linked. List t. Edge. List = new Linked. List(); for (int i = 0; i < n; i++) visited[i] = false; visited[v] = true; Queue<int> q; q. insert(v); while (!q. is. Empty()) { v = *q. Delete(v); for (all vertices w adjacent to v) if (!visited[w]) { q. insert(w); t. Edge. List. append(v, w); visited[w] = true; } } delete [] visited; }

Spanning Trees n This set of edges is called a spanning tree for the graph. n A spanning tree is any tree that consists solely of edges from G and that includes all vertices in G (the tree “spans” all vertices of the graph). n Spanning trees are not unique. Generated from DFS – called depth-first spanning tree Generated from BFS – called breadth-first spanning tree n n

BFS Spanning Tree 0 Order of Traversal: 0, 1, 2, 3, 4, 5, 6, 7 1 2 0 3 4 5 6 1 2 7 3 Edges traversed [T]: (0, 1), (0, 2), (1, 3), (1, 4), (2, 5), (2, 6) (3, 7) 4 5 6 7 BFS Spanning Tree

DFS Spanning Tree 0 Order of Traversal: 0, 1, 3 7, 4, 5, 2, 6 1 0 2 1 3 4 5 2 6 3 4 5 6 7 Edges traversed [T]: (0, 1), (1, 3), (3, 7), (7, 4), (7, 5), (5, 2), (2, 6) 7 DFS Spanning Tree

Spanning Trees n Spanning tree is a minimal subgraph G’ of G such that V(G’) = V(G) and G’ is connected n n Any connected graph with n vertices must have n-1 edges or more. n n Given all vertices from original graph, it is the smallest set of edges one needs for the graph to be connected (to be able to get from any one node to any other) Minimally, every vertex has incoming and outgoing (2 n) except last and first (-2) => 2 n-2. Every edge is counted twice (as its outgoing and incoming) so divide by 2 => [(2 n-2)/2] = n-1 Thus, a spanning tree has exactly n-1 edges (it is the minimal subgraph of G that is connected).

Applications of Spanning Trees n n Very useful in determining optimal connections Example: Communication networks (laying out cable): Assume vertices are cities and edges are communication links between cities. n All possible spanning trees are all the possible ways you could set up communication links so that everyone could talk to everyone else. n Usually interested in finding the cheapest set of links: Requires setting weights (costs) on links and finding “minimum spanning tree” [will see algorithm soon!] Interesting problem – there are potentially many spanning trees? How do you find the minimal efficiently (can’t look at them all!) n

Finding Minimum Cost Spanning Trees n n n Given a weighted directed graph and a spanning tree for the graph, define the cost of the spanning tree as the sum of the weights of the trees’ edges. The minimum cost spanning tree is the tree with minimal cost (the smallest sum over edge weights). 3 different efficient algorithms for finding minimum spanning tree

Greedy Algorithms n All 3 algorithms are greedy: n n Work in stages From the set of feasible decisions, make the best decision possible (given some metric of best) for current stage. n n Can’t change mind later Feasible ensures that the solution will obey problem constraints n Repeat for rest of stages, given the decisions you have already made and what’s left to do. n Best is usually defined as least cost or highest profit. Useful in many other programming domains n