Data Structure Algorithm 11 Minimal Spanning Tree JJCAO

  • Slides: 52
Download presentation
Data Structure & Algorithm 11 – Minimal Spanning Tree JJCAO Steal some from Prof.

Data Structure & Algorithm 11 – Minimal Spanning Tree JJCAO Steal some from Prof. Yoram Moses & Princeton COS 226

Weighted Graphs • 2

Weighted Graphs • 2

Sub-Graphs Note: G' is not a spanning sub-graph of G 3

Sub-Graphs Note: G' is not a spanning sub-graph of G 3

Minimum Spanning Tree • • A Subgraph A tree Spans G Of minimal weight

Minimum Spanning Tree • • A Subgraph A tree Spans G Of minimal weight 4

MST Origin Otakar Boruvka (1926). • Electrical Power Company of Western Moravia in Brno.

MST Origin Otakar Boruvka (1926). • Electrical Power Company of Western Moravia in Brno. • Most economical construction of electrical power network. • Concrete engineering problem is now a cornerstone problem in combinatorial optimization. 5

MST describes arrangement of nuclei in the epithelium for cancer research http: //www. bccrc.

MST describes arrangement of nuclei in the epithelium for cancer research http: //www. bccrc. ca/ci/ta 01_archlevel. html 6

Normal Consistency [Hoppe et al. 1992] • Based on angles between unsigned normals •

Normal Consistency [Hoppe et al. 1992] • Based on angles between unsigned normals • May produce errors on close-by surface sheets 7

MST is fundamental problem with diverse applications • Network design. – telephone, electrical, hydraulic,

MST is fundamental problem with diverse applications • Network design. – telephone, electrical, hydraulic, TV cable, computer, road • Approximation algorithms for NP-hard problems. – traveling salesperson problem, Steiner tree • Indirect applications. – – – – max bottleneck paths LDPC codes for error correction image registration with Renyi entropy learning salient features for real-time face verification reducing data storage in sequencing amino acids in a protein model locality of particle interactions in turbulent fluid flows autoconfig protocol for Ethernet bridging to avoid cycles in a network • Cluster analysis. 8

Minimum Spanning Tree on Surface of Sphere 5000 Vertices 9

Minimum Spanning Tree on Surface of Sphere 5000 Vertices 9

Minimum Spanning Tree Input: a connected, undirected graph - G, with a weight function

Minimum Spanning Tree Input: a connected, undirected graph - G, with a weight function on the edges – wt Goal: find a Minimum-weight Spanning Tree for G Fact: If all edge weights are distinct, the MST is unique Brute force: Try all possible spanning trees • problem 1: not so easy to implement • problem 2: far too many of them Ex: [Cayley, 1889]: V^{V-2} spanning trees on the complete graph on V vertices. 10

Main algorithms of MST 1. Kruskal’s algorithm 2. Prim’s algorithm Both O(Elg. V) using

Main algorithms of MST 1. Kruskal’s algorithm 2. Prim’s algorithm Both O(Elg. V) using ordinary binary heaps Both greedy algorithms => Global solution 3. … 11

Two Greedy Algorithms • Kruskal's algorithm. Consider edges in ascending order of cost. Add

Two Greedy Algorithms • Kruskal's algorithm. Consider edges in ascending order of cost. Add the next edge to T unless doing so would create a cycle. • Prim's algorithm. Start with any vertex s and greedily grow a tree T from s. At each step, add the cheapest edge to T that has exactly one endpoint in T. Greed is good. Greed is right. Greed works. Greed clarifies, cuts through, and captures the essence of the evolutionary spirit. " - Gordon Gecko 12

Cycle Property • Let T be a minimum spanning tree of a weighted graph

Cycle Property • Let T be a minimum spanning tree of a weighted graph G • Let e be an edge of G that is not in T and C be the cycle formed by e with T • For every edge f of C, weight(f) ≤ weight(e) Proof: • By contradiction • If weight(f) > weight(e) we can get a spanning tree of smaller weight by replacing e with f 13

Edges cross the cut 14

Edges cross the cut 14

Cut (/Partition) Property Lemma: Let G =(V, E) and X ⊂ V. If e

Cut (/Partition) Property Lemma: Let G =(V, E) and X ⊂ V. If e = a lightest edge connecting X and V-X then e appears in some MST of G. Proof: • Let T be an MST of G • If T does not contain e, consider the cycle C formed by e with T and let f be an edge of C across the partition • By the cycle property, weight(f) ≤ weight(e) • Thus, weight(f) = weight(e) • We obtain another MST by replacing f with e locally optimal choice (of lightest edges) globally optimal solution (MST) 15

Disjoint Set ADT 16

Disjoint Set ADT 16

An application of disjoint-set data structures 17

An application of disjoint-set data structures 17

Linked List Implementation 18

Linked List Implementation 18

Union in Linked List Implementation 19

Union in Linked List Implementation 19

20

20

Worst-Case Example • 21

Worst-Case Example • 21

Weighted Union Heuristic • Each set id includes the length of the list •

Weighted Union Heuristic • Each set id includes the length of the list • In Union - append shorter list at end of longer Theorem: Performing m > n operations takes O(m + nlgn) time 22

Simple Forest Implementation Find-Set(x) follow pointers from x up to root Union(c, f) -

Simple Forest Implementation Find-Set(x) follow pointers from x up to root Union(c, f) - make c a child of f and return f ∪ 23

Worst-Case Example n … 3 2 1 24

Worst-Case Example n … 3 2 1 24

Weighted Union Heuristic • Each node includes a weight field weight = # elements

Weighted Union Heuristic • Each node includes a weight field weight = # elements in sub-tree rooted at node • Find-Set(x) - as before O(depth(x)) • Union(x, y) - always attach smaller tree below the root of larger tree O(1) 25

Weighted Union Theorem: Any k-node tree created using the weighted-union heuristic, has height ≤

Weighted Union Theorem: Any k-node tree created using the weighted-union heuristic, has height ≤ lg(k) Proof: By induction on k Find-Set Running Time: O(lg n) 26

2 nd heuristic: Path Compression 27

2 nd heuristic: Path Compression 27

The function lg n = the number of times we have to take the

The function lg n = the number of times we have to take the log 2 n repeatedly to reach root node Lg 2 = 1 Lg 2^2 = 2 Lg 2^16 = lg 65536 = 16 => Lg n < 16 for all practical values of n 28

Theorem(Tarjan): If S = a sequence of O(n) Unions and Find-Sets The worst-case time

Theorem(Tarjan): If S = a sequence of O(n) Unions and Find-Sets The worst-case time for S with – Weighted Unions, and – Path Compressions is O(nlgn) The average time is O(lgn) per operation in Linked List Implementation 29

Theorem(Tarjan): Let S = a sequence of O(n) Unions and Find-Sets The worst-case time

Theorem(Tarjan): Let S = a sequence of O(n) Unions and Find-Sets The worst-case time for S with – Weighted Unions, and – Path Compressions is O(nα(n)) The average time is O(α(n)) per operation, α(n) < 5 in practice 30

Connected Components using Union-Find Reminder: • Every node v is connected to itself •

Connected Components using Union-Find Reminder: • Every node v is connected to itself • if u and v are in the same connected component then v is connected to u and u is connected to v • Connected components form a partition of the nodes and so are disjoint: 31

MST-Kruskal's algorithm for minimum spanning tree works by inserting edges in order of increasing

MST-Kruskal's algorithm for minimum spanning tree works by inserting edges in order of increasing cost, adding as edges to the tree those which connect two previously disjoint components. The minimum spanning tree describes the cheapest network to connect all of a given set of vertices Kruskal's algorithm on a graph of distances between 128 North American cities 32

Example 33

Example 33

34

34

MST-Kruskal 35

MST-Kruskal 35

MST-Kruskal 36

MST-Kruskal 36

MST-Kruskal Running Time: 37

MST-Kruskal Running Time: 37

MST-Prim-Jarnik 38

MST-Prim-Jarnik 38

Example 39

Example 39

MST-Prim-Jarnik 40

MST-Prim-Jarnik 40

MST-Prim 41

MST-Prim 41

MST-Prim 42

MST-Prim 42

MST-Prim 43

MST-Prim 43

Decrease_key(v, x) We use a min-Heap to hold the edges in G-T How can

Decrease_key(v, x) We use a min-Heap to hold the edges in G-T How can we implement Decrease key(v, x)? Simple solution: • Change value for v • Follow strategy for Heap_insert from v upwards • Cost: O(lg. V) 44

MST-Prim Running Time: 45

MST-Prim Running Time: 45

Does a linear-time MST algorithm exist? 46

Does a linear-time MST algorithm exist? 46

Euclidean MST • 47

Euclidean MST • 47

Scientific application: clustering k-clustering. Divide a set of objects classify into k coherent groups.

Scientific application: clustering k-clustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Goal. Divide into clusters so that objects in different clusters are far apart. outbreak of cholera deaths in London in 1850 s (Nina Mishra) Applications. • Routing in mobile ad hoc networks. • Document categorization for web search. • Similarity searching in medical image databases. • Skycat: cluster 109 sky objects into stars, quasars, galaxies. 48

Single-link clustering k-clustering. Divide a set of objects classify into k coherent groups. Distance

Single-link clustering k-clustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Goal. Divide into clusters so that objects in different clusters are far apart. Single link. Distance between two clusters equals the distance between the two closest objects (one in each cluster). Single-link clustering. Given an integer k, find a k-clustering that maximizes the distance between two closest clusters. 49

Single-link clustering algorithm “Well-known” algorithm for single-link clustering: • Form V clusters of one

Single-link clustering algorithm “Well-known” algorithm for single-link clustering: • Form V clusters of one object each. • Find the closest pair of objects such that each object is in a different cluster, and merge the two clusters. • Repeat until there are exactly k clusters. Observation. This is Kruskal's algorithm (stop when k connected components). Alternate solution. Run Prim's algorithm and delete k-1 max weight edges. 50

Dendrogram Tree diagram that illustrates arrangement of clusters. 51

Dendrogram Tree diagram that illustrates arrangement of clusters. 51

Dendrogram of cancers in human Tumors in similar tissues cluster together 52

Dendrogram of cancers in human Tumors in similar tissues cluster together 52