Minimum Spanning Trees Featuring Disjoint Sets HKOI Training

Prerequisites Asymptotic complexity n Set theory n Elementary graph theory n Priority queues (or

Graphs A graph is a set of vertices and a set of edges n

Roadmap What is a tree? n Disjoint sets n Minimum spanning trees n Various

Trees in graph theory n In graph theory, a tree is an acyclic, connected

Properties of trees n |E| = |V| - 1 ¨ |E| n n n

Definition? n The following four conditions are equivalent: ¨G is connected and acyclic ¨

Other properties of trees Bipartite n Planar n A tree with at least two

The Union-Find problem n N balls initially, each ball in its own bag ¨

The Union-Find problem An example with 4 balls n Initial: {1}, {2}, {3}, {4}

Disjoint sets Disjoint-set data structures can be used to solve the union-find problem n

Implementation 1: Naive arrays Bag[x] : = representative of the bag containing x n

Implementation 1: Naive arrays n How to union Bag[x] and Bag[y]? ¨Z : =

Implementation 2: Forest A forest is a collection of trees n Each bag is

Implementation 2: Forest n Find(x) ¨ Traverse n from x up to the root

Implementation 2: Forest n Initial: Union 1 3: 1 2 1 2 3 4

Implementation 2: Forest n 1 Union 1 4: 2 3 4 1 Find 4:

Implementation 2: Forest n How to represent the trees? ¨ Leftmost-Child-Right-Sibling n (LCRS)? Too

Implementation 2: Forest n The worst case is still O(MN ) for M operations

Union-by-rank We should avoid tall trees n Root of the taller tree becomes the

Path compression See also the solution for Symbolic Links (HKOI 2005 Senior Final) n

Path compression n Find(4) The root is 3 3 5 5 1 6 3

U-by-rank + Path compression We ignore the effect of path compression on tree heights

Minimum spanning trees n Given a connected graph G = (V, E), a spanning

Minimum spanning trees Given a weighted connected graph G, a minimum spanning tree T*

Minimum spanning trees n Two algorithms ¨ Kruskal’s algorithm ¨ Prim’s algorithm 29

Kruskal’s algorithm n Choose edges in ascending weight greedily, while preventing cycles 30

Kruskal’s algorithm n Algorithm ¨T is an empty set ¨ Sort the edges in

Kruskal’s algorithm n How to detect a cycle? ¨ Depth-first search (DFS) O(V) per

Kruskal’s algorithm n Algorithm (using disjoint-set) ¨T is an empty set ¨ Create bags

Kruskal’s algorithm The improved time complexity is O(Elg. V) n The bottleneck is sorting

Prim’s algorithm In Kruskal’s algorithm, the MST-inprogress scatters around n Prim’s algorithm grows the

Prim’s algorithm n Algorithm ¨ Let seed be any vertex, and Grown : =

Prim’s algorithm n How to find the lightest grow-able edge? ¨ Check n all

Prim’s algorithm n How to use nearest? ¨ Grow the vertex (x) with the

Prim’s algorithm Try to program Prim’s algorithm n You may find that it’s very

Prim’s algorithm n Per round. . . ¨ Finding minimum nearest-value: O(V) ¨ Updating

MST Extensions n Second-best MST ¨ We n don’t want the best! Online MST

MST Extensions (NP-Hard) n Minimum Steiner Tree ¨ No need to connect all vertices,

Various tree topics (List) Center, eccentricity, radius, diameter n Tree isomorphism n ¨ Canonical

Supplementary readings n Advanced: ¨ Disjoint set forest (Lecture slides) ¨ Prim’s algorithm ¨

Slides: 45

Download presentation

Minimum Spanning Trees Featuring Disjoint Sets HKOI Training 2006 Liu Chi Man (cx) 25 Mar 2006

Prerequisites Asymptotic complexity n Set theory n Elementary graph theory n Priority queues (or heaps) n 2

Graphs A graph is a set of vertices and a set of edges n G = (V, E) n Number of vertices = |V| n Number of edges = |E| n We assume simple graph, so |E| = O(|V|2) n 3

Roadmap What is a tree? n Disjoint sets n Minimum spanning trees n Various tree topics n 4

What is a Tree? 5

Trees in graph theory n In graph theory, a tree is an acyclic, connected graph ¨ Acyclic means “without cycles” 6

Properties of trees n |E| = |V| - 1 ¨ |E| n n n = (|V|) Between any pair of vertices, there is a unique path Adding an edge between a pair of non-adjacent vertices creates exactly one cycle Removing an edge from the tree breaks the tree into two smaller trees 7

Definition? n The following four conditions are equivalent: ¨G is connected and acyclic ¨ G is connected and |E| = |V| - 1 ¨ G is acyclic and |E| = |V| - 1 ¨ Between any pair of vertices in G, there exists a unique path n G is a tree if at least one of the above conditions is satisfied 8

Other properties of trees Bipartite n Planar n A tree with at least two vertices has at least two leaves (vertices of degree 1) n 9

Roadmap What is a tree? n Disjoint sets n Minimum spanning trees n Various tree topics n 10

The Union-Find problem n N balls initially, each ball in its own bag ¨ Label n the balls 1, 2, 3, . . . , N Two kinds of operations: ¨ Pick two bags, put all balls in these bags into a new bag (Union) ¨ Given a ball, find the bag containing it (Find) 11

The Union-Find problem An example with 4 balls n Initial: {1}, {2}, {3}, {4} n Union {1}, {3} {1, 3}, {2}, {4} n Find 3. Answer: {1, 3} n Union {4}, {1, 3} {1, 3, 4}, {2} n Find 2. Answer: {2} n Find 1. Answer {1, 3, 4} n 12

Disjoint sets Disjoint-set data structures can be used to solve the union-find problem n Each bag has its own representative ball n ¨ {1, 3, 4} is represented by ball 3 (for example) ¨ {2} is represented by ball 2 13

Implementation 1: Naive arrays Bag[x] : = representative of the bag containing x n <O(N), O(1)> n ¨ Union n Slight modifications give <O(U), O(1)> ¨U n takes O(N) and Find takes O(1) is the size of the union Worst case: O(MN) for M operations 14

Implementation 1: Naive arrays n How to union Bag[x] and Bag[y]? ¨Z : = Bag[x] For each ball v in Z do Bag[v] : = Bag[y] Can I update the balls in Bag[y] instead? n Rule: Update the balls in the smaller bag n ¨ O(Mlg. N) for M union operations 15

Implementation 2: Forest A forest is a collection of trees n Each bag is represented by a rooted tree, with the root being the representative ball n 6 1 5 3 4 2 7 Example: Two bags --- {1, 3, 5} and {2, 4, 6, 7}. 16

Implementation 2: Forest n Find(x) ¨ Traverse n from x up to the root Union(x, y) ¨ Merge the two trees containing x and y 17

Implementation 2: Forest n Initial: Union 1 3: 1 2 1 2 3 4 4 3 Union 2 4: 3 4 1 Find 4: 3 2 4 18

Implementation 2: Forest n 1 Union 1 4: 2 3 4 1 Find 4: 2 3 4 19

Implementation 2: Forest n How to represent the trees? ¨ Leftmost-Child-Right-Sibling n (LCRS)? Too complicated ¨ Parent array Parent[x] : = parent of x n If x is a tree root, set Parent[x] : = x n 20

Implementation 2: Forest n The worst case is still O(MN ) for M operations ¨ What n is the worst case? Improvements ¨ Union-by-rank ¨ Path compression 21

Union-by-rank We should avoid tall trees n Root of the taller tree becomes the new root when union n So, keep track of tree heights (ranks) n Good Bad 22

Path compression See also the solution for Symbolic Links (HKOI 2005 Senior Final) n Find(x): traverse from x up to root n Compress the x-to-root path at the same time n 23

Path compression n Find(4) The root is 3 3 5 5 1 6 3 1 6 The root is 3 4 2 6 5 1 7 4 4 2 3 7 The root is 3 2 7 24

U-by-rank + Path compression We ignore the effect of path compression on tree heights to simplify U-by-rank n U-by-rank alone gives O(Mlg. N) n U-by-rank + path compression gives O(M (N)) n ¨ : inverse Ackermann function n (N) 5 for practically large N 25

Roadmap What is a tree? n Disjoint sets n Minimum spanning trees n Various tree topics n 26

Minimum spanning trees n Given a connected graph G = (V, E), a spanning tree of G is a graph T such that ¨T is a subgraph of G ¨ T is a tree ¨ T contains every vertex of G n A connected graph must have at least one spanning tree 27

Minimum spanning trees Given a weighted connected graph G, a minimum spanning tree T* of G is a spanning tree of G with minimum total edge weight n Application: Minimizing the total length of wires needed to connect up a collection of computers n 28

Minimum spanning trees n Two algorithms ¨ Kruskal’s algorithm ¨ Prim’s algorithm 29

Kruskal’s algorithm n Choose edges in ascending weight greedily, while preventing cycles 30

Kruskal’s algorithm n Algorithm ¨T is an empty set ¨ Sort the edges in G by their weights ¨ For (in ascending weight) each edge e do n If T {e} is acyclic then ¨ Add e to T ¨ Return T 31

Kruskal’s algorithm n How to detect a cycle? ¨ Depth-first search (DFS) O(V) per check n O(VE) overall n ¨ Disjoint n set Vertices are balls, connected components are bags 32

Kruskal’s algorithm n Algorithm (using disjoint-set) ¨T is an empty set ¨ Create bags {1}, {2}, …, {V} ¨ Sort the edges in G by their weights ¨ For (in ascending weight) each edge e do Suppose e connects vertices x and y n If Find(x) Find(y) then n ¨ Add e to T, then Union(Find(x), Find(y)) ¨ Return T 33

Kruskal’s algorithm The improved time complexity is O(Elg. V) n The bottleneck is sorting n 34

Prim’s algorithm In Kruskal’s algorithm, the MST-inprogress scatters around n Prim’s algorithm grows the MST from a “seed” n Prim’s algorithm iteratively chooses the lightest grow-able edge n ¨A grow-able edge connects a grown vertex and a non-grown vertex 35

Prim’s algorithm n Algorithm ¨ Let seed be any vertex, and Grown : = {seed} ¨ Initially T is an empty set ¨ Repeat |V|-1 times Let e=(x, y) be the lightest grow-able edge n Add e to T n Add x and y to Grown n ¨ Return T 36

Prim’s algorithm n How to find the lightest grow-able edge? ¨ Check n all (grown, non-grown) vertex pairs Too slow ¨ Each non-grown vertex x keeps a value nearest[x], which is the weight of the lightest edge connecting x to some grown vertex n Nearest[x] = if no such edge 37

Prim’s algorithm n How to use nearest? ¨ Grow the vertex (x) with the minimum nearest- value n Which edge? Keep track on it! ¨ Since x has just been grown, we need to update the nearest-values of all non-grown vertices n Only need to consider edges incident to x 38

Prim’s algorithm Try to program Prim’s algorithm n You may find that it’s very similar to Dijkstra’s algorithm for finding shortest paths! n ¨ Almost only a one-line difference 39

Prim’s algorithm n Per round. . . ¨ Finding minimum nearest-value: O(V) ¨ Updating nearest-values: O(V) (Overall O(E)) Overall: O(V 2+E) = O(V 2) time n Using a binary heap, n ¨ O(lg. V) per Finding minimum ¨ O(lg. V) per Updating ¨ Overall: O(Elg. V) time 40

MST Extensions n Second-best MST ¨ We n don’t want the best! Online MST ¨ See n IOI 2003 Path Maintenance Minimum bottleneck spanning tree ¨ The bottleneck of a spanning tree is the weight of its maximum weight edge ¨ An algorithm that runs in O(V+E) exists 41

MST Extensions (NP-Hard) n Minimum Steiner Tree ¨ No need to connect all vertices, but at least a given subset B V n Degree-bounded MST ¨ Every vertex of the spanning tree must have degree not greater than a given value K n For a discussion of NP-hardness, please attend [Talk] Introduction to Complexity Theory on 3 June 42

Roadmap What is a tree? n Disjoint sets n Minimum spanning trees n Various tree topics n 43

Various tree topics (List) Center, eccentricity, radius, diameter n Tree isomorphism n ¨ Canonical representation Prüfer code n Lowest common ancestor (LCA) n Counting spanning trees n 44

Supplementary readings n Advanced: ¨ Disjoint set forest (Lecture slides) ¨ Prim’s algorithm ¨ Kruskal’s algorithm ¨ Center and diameter n Post-advanced (so-called Beginners): ¨ Lowest common ancestor ¨ Maximum branching 45