Chapter 3 Algorithm Design and Analysis 1 3


























- Slides: 26
Chapter 3 Algorithm Design and Analysis 1
3. 1 Introduction q Complexity of Algorithms size of problem, measuring running time, efficiency criteria, … q Techniques used for developing polynomial time network algorithms geometric improvement, (bit) scaling, dynamic programming, binary search q Search algorithms Identifying basic subgraph: finding all nodes reachable from s, finding all nodes that can reach node t, identifying connected components, numbering the nodes of an acyclic graph, … q Flow decomposition Network Theory and Applications 2010 2
3. 2 Complexity Analysis q Complexity Measures: Ø Empirical analysis • Write program and test on some classes of problem instances • Results may not be consistent Ø Average-case analysis (statistical analysis) • Estimate the expected number of steps • Need probability distribution • Usually hard to do analysis Ø Worst-case analysis (guaranteed performance) • Usually easier to do analysis • Some bad instances may affect the analysis (e. g. simplex method for LP) • Most popular Network Theory and Applications 2010 3
q Problem Size: amount of computer storage to describe the problem in computer integer x: log 2 x + 1 ( O(log 2 x) ) rational number p/q: log 2 p + log 2 q + 2 size of network problem: approximately n log n + m log m + m log C + m log U ( f(m, n, log. C, log. U) Network Theory and Applications 2010 4
q Worst-Case Complexity: Counting running time: assume unit time for each operation (compare, +/-, *, /) as long as the size of the numbers during computation remains polynomial function of the size of the input data (e. g. k, log k = k log ) Running time of the problem of size n (g(n)) is the largest time (number of steps) taken by all problems whose size is n (worst-case view point). Take asymptotic upper bound function f(n) q Def: Algorithm is said to run in O(f(n)) time if for some numbers c and n 0, the time taken by the algorithm is at most cf(n) for all n n 0. q Similarity Assumption: C = O(nk), U = O(nk) Network Theory and Applications 2010 5
q Polynomial and Exponential-Time Algorithms: q Def: Algorithm is said to be a polynomial-time algorithm if worst-case complexity is bounded by a polynomial function of problem size. (problem parameters and data size, m, n, log. C, log. U) q Def: exponential-time algorithm if not polynomial-time algorithm q Def: strongly polynomial-time algorithm if polynomial function of problem parameter (size of problem data not involved) (e. g. , O(n 2 m), O(m 2), not O(nlog. U) ). Very desirable as an algorithm. q The running time of a problem is bounded by a polynomial function if and only if it is also bounded by a polynomial in the problem parameter and the length of the encoding of data. Network Theory and Applications 2010 6
q Note that if the running time is polynomial function of C, U (cost, capacity), it is not a polynomial-time algorithm ( C = 2 log C , which is not polynomial in log C). Ex: binary knapsack problem has a dynamic programming algorithm which runs in O(nb) – pseudopolynomial-time algorithm. q Def: Algorithm is said to run in (f(n)) time if for some numbers c’ and n 0, and all n n 0, the algorithm takes at least c’f(n) time on some problem instance. (asymptotic lower bound) q Def: Algorithm is said to be (f(n)) if the algorithm is both O(f(n)) and (f(n)). (tight bound) Network Theory and Applications 2010 7
q Potential Functions and Amortized Complexity: (maximum possible steps in each iteration) (maximum possible iterations) can be an extreme overestimate. q Ex: Stack – two operatons push(x, S). Add element x to the top of the stack S. popall(S). Pop (i. e. , take out) every element of S. What is the worst-case complexity of sequence of n operations? Naïve approach: push(x, S) – O(1), popall(S) – O(|S|) at most n popall operation, each operation O(|n|) (n 2) Potential function approach: (k) = |S| denote the number of items at the end of the k-th step. each push increases (k) by one unit and takes 1 unit time. Network Theory and Applications 2010 8
each popall decreases (k) by at least 1 and requires time proportional to | (k)|. Total increase in (k) is at most n, hence total decrease in (k) is at most n. total time is O(n). q Amortized complexity: An operation is said to be of amortized complexity O(f(n)) if the time to perform a sequence of k operations is O(kf(n)) for sufficiently large k. (average worst-case complexity) amortized complexity of popall operation is O(1). Network Theory and Applications 2010 9
3. 3 Developing Polynomial-Time Algorithms q Geometric Improvement Approach: Assume integer optimal objective function value. Let H be the difference between the maximum and minimum objective function values. Minimization problem. q Thm 3. 3. Let zk: objective function value at k-th iteration. z*: optimal value. If algorithm guarantees, for every k, (zk – zk+1) (zk – z*), (0 < < 1). Then the algorithm terminates in O((log H)/ ) iterations. Network Theory and Applications 2010 10
q (Bit) Scaling Approach: Represent data in binary (up to K digits). Consider P 1, P 2, … , PK, where Pi is the problem with data using 1~i leading digits. Use optimal solution of Pi as starting solution of Pi+1. q Ex: maximum flow problem i (uij = 5) j P 1 i (101) P 2 P 3 Network Theory and Applications 2010 i i (1) (101) j j j 11
q Capacity of an arc in Pk = 2 capacity of arc in Pk-1 + {0 or 1} Set initial solution for Pk as 2 optimal solution of Pk-1 (still feasible in Pk) Let vk be optimal value of Pk. Then vk – 2 vk-1 m. q Total number of reoptimization is O(log. C) or O(log. U). (polynomial) Network Theory and Applications 2010 12
q Dynamic Programming: table-filling approach in the text. q Computing Binomial Coefficients: How to compute p. Cq = p!/(p-q)!q! easily? Use i. Cj = i-1 Cj + i-1 Cj-1 Define lower triangular table D = {d(i, j)} with p rows and q columns. d(i, j) = i. Cj for i j. Scan rows from 1 to p. When scan row i, scan columns from 1 to i. Network Theory and Applications 2010 13
q Knapsack problem: maximize i=1 p uixi subject to i=1 p wixi W xi {0, 1} for all i. Construct p W table D, where d(i, j): max obj. value when we use items 1~i, with knapsack capacity j. recursive equation: d(i, j) = max { d(i-1, j), ui + d(i-1, j-wi)}, Scan rows from 1 to p. When scan row i, scan columns from 1 to W. Running time is O(p. W). q Binary search: x [L, U], take the center point of the interval and discard half. runs in log(U-L). Network Theory and Applications 2010 14
3. 4 Search Algorithms q Basic techniques for graphs that attempt to find all the nodes in a network satisfying a particular property. Frequently used as subroutines of other more complex algorithms. Ø Finding all nodes that are reachable by directed paths from a specific node s. Ø Finding all nodes that can reach a specific node t along directed paths. Ø Identifying all connected components Ø Determining whether a given network is bipartite. Ø Identifying whether a directed cycle exists. If acyclic, find numbering of the nodes such that if (i, j) A, then i < j (topological ordering). Network Theory and Applications 2010 15
q Finding all nodes that are reachable by directed paths from a specific node s. Starting from node s, identify nodes reachable from the node s. States for nodes: marked or unmarked arc (i, j) admissible arc if i marked and j unmarked. inadmissible, otherwise. q Initially, only source node s is marked. From a marked node, mark another node using admissible arc. Then add the newly marked node to the LIST of marked nodes. Different results obtained depending on the data structure of LIST. q To identify admissible arcs, use current-arc data structure. In adjacency list A(i) of node i, current arc (i, j) is the next candidate arc that we wish to examine. Initially, current arc is the first arc in the list A(i). q Running time is O(n+m) = O(m) Network Theory and Applications 2010 16
q Breadth-First Search Maintain LIST as a queue. Select nodes from the front of LIST and add nodes to the rear. Results in breadth-first search tree q Depth-First Search Maintain LIST as a stack. Select nodes from the front of LIST and add nodes to the front. Results in depth-first search tree. q Reverse Search Algorithm: Identify all the nodes from which we can reach a given node t along directed paths. 1. Initialize LIST = {t} 2. While scanning a node, we scan the incoming arcs instead of its outgoing arcs. 3. Arc (i, j) is admissible if i unmarked and j marked. Network Theory and Applications 2010 17
q Determining Strong Connectivity: G is strongly connected if there exists a directed path from i to j for any node pair i and j. Strongly connected if and only if we can reach any node from an arbitrary node s, and s is reachable from any node in G. Use two applications of the search algorithm. Network Theory and Applications 2010 18
q Topological Ordering: Topological ordering: labeling (order(i)) of nodes so that (i, j) A order(i) < order(j). If G contains a directed cycle, topological ordering does not exist. (Contraposition of the statement is: topological ordering G is acyclic. ) Give an algorithm for finding a topological ordering of acyclic graph. Then G is acyclic topological ordering. Hence together: G is acyclic topological ordering. q Thm: G = (N, A) directed. If each node has indegree at least one, the network contains a directed cycle. (exercise 3. 38) q Hence If G is acyclic, there exists a node with 0 indegree. Network Theory and Applications 2010 19
q Algorithm Choose a node with 0 indegree. Give it a label of 1 and eliminate the node and all arcs emanating from it. Select a node with 0 indegree in the remaining graph and give it a label 2, … (The remaining graph is still acyclic) Repeat the process until no node has 0 indegree. If there are some nodes and arcs remaining, the subnetwork contains a directed cycle. Otherwise, we have a topological ordering. q Start with a set LIST containing nodes with 0 indegree. Choose a node i in LIST, and for every arc (i, j) A(i) we reduce the indegree of node j by 1, and if indegree of node j becomes 0, add node j to LIST q Running time is O(m). Network Theory and Applications 2010 20
3. 5 Flow Decomposition Algorithms q Current model uses arc flow variables xij. May use path, cycle flows as decision variables. q Arc flow representation path, cycle flow representation (? ) q Arc flow: {j: (i, j) A} xij - {j: (j, i) A} xji = -e(i) for all i N, 0 xij uij for all (i, j) A, where i=1 n e(i) = 0 q e(i) = inflow – outflow of node i Ø e(i) > 0 node i is excess node Ø e(i) < 0 node i is deficit node Ø e(i) = 0 node i is balanced Network Theory and Applications 2010 (inflow > outflow) (inflow < outflow) 21
q Let P = set of all directed paths, W = set of all directed cycles f(P) = decision variable for flow value on path P f(W) = decision variable for flow value on directed cycle W ij(P) = 1, if (i, j) is contained in the path P = 0, otherwise Similarly for ij(W) q xij = P P ij(P)f(P) + W W ij(W)f(W) converse? Network Theory and Applications 2010 22
q Thm 3. 5 (Flow Decomposition Theorem). arc flow path and cycle flow a. Positive path flow connects a deficit node to an excess node b. At most n + m paths and cycles have positive flow. At most m cycles have positive flow. Pf) show that arc flow path and cycle flow (1) Choose a deficit node (inflow < outflow), follow directed arcs with positive flow until an excess node met or a cycle found Ø Path found: f(P) = min { -e(i 0), eik, min{xij: (i, j) P} }. Update flow Ø Cycle found: f(W) = min{xij: (i, j) W}. xij = xij – f(W). Continue until all node imbalances are zero. Now eliminate all flows using directed cycles. When path found, we reduce the excess/deficit of some node to 0 or the flow on some arc to 0. When cycle found, flow on some arc becomes 0. Hence n + m total paths and cycles, and at most m directed cycles. � Network Theory and Applications 2010 23
q Note that the flow decomposition may not be unique q Property 3. 6. A circulation x can be represented as cycle flow along at most m directed cycles. q Maintain LIST of deficit nodes as doubly linked list. When LIST eventually becomes empty, initialize it with the set of arcs with positive flow. To Identify an arc with positive flow emanating from a node (admissible arc), use current-arc data structure. In an iteration: O(n) + time to scan arcs to identify admissible arcs. Flows arc nonincreasing, hence if an arc becomes inadmissible, it remains inadmissible. Can use current-arc structure to scan arcs total O(m) total iteration is (n+m) running time O(m+(n+m)n) = O(nm). Network Theory and Applications 2010 24
q Given two feasible solutions x and x 0 to MCF, the flow difference x – x 0 has e(i) = 0 for all i (interpret xij < 0 as sending |xij| flow on (j, i) ). Hence x – x 0 is a circulation. By flow decomposition theorem, we can construct any x from x 0 by adding flows on directed cycles (when opposite direction arcs are considered). q Augmenting cycle: A cycle W (not necessarily directed) in G is called an augmenting cycle with respect to the flow x if by augmenting a positive amount of flow f(W) around the cycle, the flow remains feasible. W is augmenting cycle in G if xij < uij forward arc (i, j)(increase flow) and xij > 0 for every backward arc (i, j) (decrease flow). Hence can construct any feasible flow x from feasible flow x 0 by using augmenting cycle. Let ij(W) = 1, if arc (i, j) is a forward arc in W = -1, if arc (i, j) is a backward arc in W = 0, otherwise Cost of augmenting cycle W is c(W) = (i, j) Wcij ij(W) Network Theory and Applications 2010 25
q Now interpret augmenting cycle using residual network G(x 0). Each augmenting cycle in G with respect to a flow x 0 corresponds to a directed cycle W in residual network G(x 0) and vice versa. q Cost of feasible flow x in G : cx = cx 0 + cx 1 ( x, x 0: feasible flow in G, x 1: feasible circulation in G(x 0)) q (Augmenting Cycle Theorem) Let x and x 0 be any two feasible solutions of a network flow problem. Then x equals x 0 plus the flow on at most m directed cycles in G(x 0). Furthermore, the cost of x equals the cost of x 0 plus the cost of flow on these augmenting cycles. q Thm 3. 8 (Negative Cycle Optimality Theorem). A feasible solution x* of the MCF is an optimal solution if and only if the residual network G(x*) contains no negative cost directed cycle. Network Theory and Applications 2010 26