Randomization in Graph Optimization Problems David Karger MIT

Randomization in Graph Optimization Problems David Karger MIT http: //theory. lcs. mit. edu/~karger MIT

Randomized Algorithms Flip coins to decide what to do next l Avoid hard work of making “right” choice l Often faster and simpler than deterministic algorithms l l Different from average-case analysis » » MIT Input is worst case Algorithm adds randomness

Methods l Random selection » l Monte Carlo simulation » l » MIT simulations estimate event likelihoods Random sampling » l if most candidate choices “good”, then a random choice is probably good generate a small random subproblem solve, extrapolate to whole problem Randomized Rounding for approximation

Cuts in Graphs Focus on undirected graphs l A cut is a vertex partition l Value is number (or total weight) of crossing edges l MIT

Optimization with Cuts l Cut values determine solution of many graph optimization problems: » » » min-cut / max-flow multicommodity flow (sort-of) bisection / separator network reliability network design Randomization helps solve these problems MIT

Presentation Assumption For entire presentation, we consider unweighted graphs (all edges have weight/capacity one) l All results apply unchanged to arbitrarily weighted graphs l » Integer weights = parallel edges » Rational weights scale to integers » Analysis unaffected » Some implementation details MIT

Basic Probability Conditional probability » Pr[A Ç B] = Pr[A] × Pr[B | A] l Independent events multiply: » Pr[A Ç B] = Pr[A] × Pr[B] l Linearity of expectation: » E[X + Y] = E[X] + E[Y] l Union Bound » Pr[X È Y] [ Pr[X] + Pr[Y] l MIT

Random Selection for Minimum Cuts Random choices are good when problems are rare MIT

Minimum Cut Smallest cut of graph l Cheapest way to separate into 2 parts l Various applications: l » network reliability (small cuts are weakest) » subtour elimination constraints for TSP » separation oracle for network design l MIT Not s-t min-cut

Max-flow/Min-cut s-t flow: edge-disjoint packing of s-t paths l s-t cut: a cut separating s and t l [FF]: s-t max-flow = s-t min-cut l » max-flow saturates all s-t min-cuts » most efficient way to find s-t min-cuts l [GH]: min-cut is “all-pairs” s-t min-cut » find using n flow computations MIT

Flow Algorithms l Push-relabel [GT]: » » push “excess” around graph till it’s gone max-flow in O*(mn) (note: O* hides logs) – » l min-cut in O*(mn 2) --- “harder” than flow Pipelining [HO]: » save push/relabel data between flows » MIT Recent O*(m 3/2) [GR] min-cut in O*(mn) --- “as easy” as flow

Contraction Find edge that doesn’t cross min-cut l Contract (merge) endpoints to 1 vertex l MIT

Contraction Algorithm l Repeat n - 2 times: » » find non-min-cut edge contract it (keep parallel edges) Each contraction decrements #vertices l At end, 2 vertices left l » » MIT unique cut corresponds to min-cut of starting graph

Picking an Edge Must contract non-min-cut edges l [NI]: O(m) time algorithm to pick edge l » » n contractions: O(mn) time for min-cut slightly faster than flows If only could find edge faster…. Idea: min-cut edges are few MIT

Randomize Repeat until 2 vertices remain pick a random edge contract it (keep fingers crossed) MIT

Analysis I l Min-cut is small---few edges » » » Suppose graph has min-cut c Then minimum degree at least c Thus at least nc/2 edges Random edge is probably safe Pr[min-cut edge] £ c/(nc/2) = 2/n (easy generalization to capacitated case) l MIT

Analysis II Algorithm succeeds if never accidentally contracts min-cut edge l Contracts #vertices from n down to 2 l When k vertices, chance of error is 2/k l » l MIT thus, chance of being right is 1 -2/k Pr[always right] is product of probabilities of being right each time

Analysis III …not too good! MIT

Repetition l Repetition amplifies success probability » » MIT basic failure probability 1 - 2/n 2 so repeat 7 n 2 times

How fast? l Easy to perform 1 trial in O(m) time » just use array of edges, no data structures But need n 2 trials: O(mn 2) time l Simpler than flows, but slower l MIT

An improvement [KS] l When k vertices, error probability 2/k » big when k small l Idea: once k small, change algorithm » » l Amplify by repetition! » MIT algorithm needs to be safer but can afford to be slower Repeat base algorithm many times

Recursive Algorithm RCA ( G, n ) {G has n vertices} repeat twice randomly contract G to n/21/2 vertices RCA(G, n/21/2) (50 -50 chance of avoiding min-cut) MIT

Main Theorem l On any capacitated, undirected graph, Algorithm RCA » » l MIT runs in O*(n 2) time with simple structures finds min-cut with probability ³ 1/log n Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10 -6) in O(n 2 log 2 n) time.

Proof Outline Graph has O(n 2) (capacitated) edges l So O(n 2) work to contract, then two subproblems of size n/2½ l » T(n) = 2 T(n/2½) + O(n 2) = O(n 2 log n) l Algorithm fails if both iterations fail » Iteration succeeds if contractions and recursion succeed » P(n)=1 - [1 - ½ P(n/2½)]2 = W (1 / log n) MIT

Failure Modes Monte Carlo algorithms always run fast and probably give you the right answer l Las Vegas algorithms probably run fast and always give you the right answer l To make a Monte Carlo algorithm Las Vegas, need a way to check answer l » l MIT repeat till answer is right No fast min-cut check known (flow slow!)

How do we verify a minimum cut? MIT

Enumerating Cuts The probabilistic method, backwards MIT

Cut Counting Original CA finds any given min-cut with probability at least 2/n(n-1) l Only one cut found l Disjoint events, so probabilities add l So at most n(n-1)/2 min-cuts l » probabilities would sum to more than one l Tight » Cycle has exactly this many min-cuts MIT

Enumeration RCA as stated has constant probability of finding any given min-cut l If run O(log n) times, probability of missing a min-cut drops to 1/n 3 l But only n 2 min-cuts l So, probability miss any at most 1/n l So, with probability 1 -1/n, find all l » O(n 2 log 3 n) time MIT

Generalization If G has min-cut c, cut £ac is a-mincut l Lemma: contraction algorithm finds any given a-mincut with probability W (n-2 a) l » Proof: just add a factor to basic analysis Corollary: O(n 2 a) a-mincuts l Corollary: Can find all in O*(n 2 a) time l » Just change contraction factor in RCA MIT

Summary l A simple fast min-cut algorithm » Random selection avoids rare problems Generalization to near-minimum cuts l Bound on number of small cuts l » Probabilistic method, backwards MIT

Network Reliability Monte Carlo estimation MIT

The Problem l Input: » Graph G with n vertices » Edge failure probabilities – For simplicity, fix a single p l Output: » FAIL(p): probability G is disconnected by edge failures MIT

Approximation Algorithms Computing FAIL(p) is #P complete [V] l Exact algorithm seems unlikely l Approximation scheme l » Given G, p, e, outputs e-approximation » May be randomized: – succeed with high probability » Fully polynomial (FPRAS) if runtime is polynomial in n, 1/e MIT

Monte Carlo Simulation Flip a coin for each edge, test graph l k failures in t trials Þ FAIL(p) » k/t l E[k/t] = FAIL(p) l How many trials needed for confidence? l » “bad luck” on trials can yield bad estimate » clearly need at least 1/FAIL(p) l MIT Chernoff bound: O*(1/e 2 FAIL(p)) suffice to give probable accuracy within e » Time O*(m/e 2 FAIL(p))

Chernoff Bound Random variables Xi ` [0, 1] l Sum X = å Xi l Bound deviation from expectation Pr[ |X-E[X]| m e E[X] ] < exp(-e 2 E[X]/4) l If E[X] m 4(log n)/e 2, “tight concentration” l » Deviation by e probability < 1 / n l MIT No one variable is a big part of E[X]

Application Let Xi=1 if trial i is a failure, else 0 l Let X = X 1 + … + Xt l Then E[X] = t FAIL(p) l Chernoff says X within relative e of E[X] with probability 1 -exp(e 2 t FAIL(p)/4) l So choose t to cancel other terms l » “High probability” t = O(log n / e 2 FAIL(p)) » Deviation by e probability < 1 / n MIT

Review l Contraction Algorithm » O(n 2 a) a-mincuts » Enumerate in O*(n 2 a) time MIT

Network reliability problem l Random edge failures » Estimate FAIL(p) = Pr[graph disconnects] l Naïve Monte Carlo simulation » Chernoff bound---“tight concentration” Pr[ |X-E[X]| m e E[X] ] < exp(-e 2 E[X]/4) » O(log n / e 2 FAIL(p)) trials expect O(log n / e 2) network failures---good for Chernoff » So estimate within e in O*(m/e 2 FAIL(p)) time MIT

Rare Events When FAIL(p) too small, takes too long to collect sufficient statistics l Solution: skew trials to make interesting event more likely l But in a way that let’s you recover original probability l MIT

DNF Counting l Given DNF formula (OR of ANDs) (e 1 Ùe 2 Ù e 3) Ú (e 1 Ù e 4) Ú (e 2 Ù e 6) Each variable set true with probability p l Estimate Pr[formula true] l » #P-complete l [KL, KLM] FPRAS » Skew to make true outcomes “common” » Time linear in formula size MIT

Rewrite problem l Assume p=1/2 » Count satisfying assignments l “Satisfaction matrix » Sij=1 if ith assignment satisfies jth clause We want number of nonzero rows l Randomly sampling rows won’t work l » Might be too few nonzeros MIT

New sample space l So normalize every nonzero row to sum to one (divide by number of nonzeros) » Now sum of nonzeros is desired value » So sufficient to estimate average nonzero MIT

Sampling Nonzeros l We know number of nonzeros/column » If satisfy given clause, all variables in clause must be true » All other variables unconstrained l Estimate average by random sampling » Know number of nonzeros/column » So can pick random column » Then pick random true-for-column assignment MIT

Few Samples Needed Suppose k clauses l Then E[sample] > 1/k l » 1 £ satisfied clauses £ k » 1 ³ sample value ³ 1/k Adding O(k log n / e 2) samples gives “large” mean l So Chernoff says sample mean is probably good estimate l MIT

Reliability Connection l Reliability as DNF counting: » Variable per edge, true if edge fails » Cut fails if all edges do (AND of edge vars) » Graph fails if some cut does (OR of cuts) » FAIL(p)=Pr[formula true] Problem: the DNF has 2 n clauses MIT

Focus on Small Cuts Fact: FAIL(p) > pc l Theorem: if pc=1/n(2+d) then Pr[>a-mincut fails]< n-ad l Corollary: FAIL(p) » Pr[£ a-mincut fails], where a=1+2/d l Recall: O(n 2 a) a-mincuts l Enumerate with RCA, run DNF counting l MIT

Proof of Theorem Given pc=1/n(2+d) l At most n 2 a cuts have value ac l Each fails with probability pac=1/na(2+d) l Pr[any cut of value ac fails] = O(n-ad) l Sum over all a > 1 l MIT

Algorithm RCA can enumerate all a-minimum cuts with high probability in O(n 2 a) time. l Given a-minimum cuts, can e-estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n 2 a)) l Corollary: when FAIL(p)< n-(2+d), can e-approximate it in O (cn 2+4/d) time l MIT

Combine For large FAIL(p), naïve Monte Carlo l For small FAIL(p), RCA/DNF counting l Balance: e-approx. in O(mn 3. 5/e 2) time l Implementations show practical for hundreds of nodes l Again, no way to verify correct l MIT

Summary Naïve Monte Carlo simulation works well for common events l Need to adapt for rare events l Cut structure and DNF counting lets us do this for network reliability l MIT

Random Sampling More min-cut algorithms MIT

Random Sampling l General tool for faster algorithms: » » » l Speed-accuracy tradeoff » » MIT pick a small, representative sample analyze it quickly (small) extrapolate to original (representative) smaller sample means less time but also less accuracy

Min-cut Duality l [Edmonds]: min-cut=max tree packing » » » l [Gabow] “augmenting trees” » » » MIT convert to directed graph “source” vertex s (doesn’t matter which) spanning trees directed away from s add a tree in O*(m) time min-cut c (via max packing) in O*(mc) great if m and c are small…

Example min-cut 2 2 directed spanning trees MIT directed min-cut 2

Sampling for Approximation MIT

Random Sampling [Gabow] scheme great if m, c small l Random sampling l » » » l reduces m, c scales cut values (in expectation) if pick half the edges, get half of each cut So find tree packings, cuts in samples Problem: maybe some large deviations MIT

Sampling Theorem Given graph G, build a sample G(p) by including each edge with probability p l Cut of value v in G has expected value pv in G(p) l Definition: “constant” r = 8 (ln n) / e 2 l Theorem: With high probability, all cuts in G(r / c) have (1 ± e) times their expected values. l MIT

A Simple Application [Gabow] packs trees in O*(mc) time l Build G(r / c) l » » l MIT minimum expected cut r by theorem, min-cut probably near r find min-cut in time O*(rm) using [Gabow] corresponds to near-min-cut in G Result: (1+e) times min-cut in O*(rm) time

Proof of Sampling: Idea Chernoff bound says probability of large deviation in cut value is small l Problem: exponentially many cuts. Perhaps some deviate a great deal l Solution: showed few small cuts l » only small cuts likely to deviate much » but few, so Chernoff bound applies MIT

Proof of Sampling l Sampled with probability r /c, » a cut of value ac has mean ar » [Chernoff]: deviates from expected size by more than e with probability at most n-3 a At most n 2 a cuts have value ac l Pr[any cut of value ac deviates] = O(n-a) l Sum over all a ³ 1 l MIT

Las Vegas Algorithms Finding Good Certificates MIT

Approximate Tree Packing Break edges into c /r random groups l Each looks like a sample at rate r / c l » O*( rm / c) edges » each has min expected cut r » so theorem says min-cut (1 – e) r So each has a packing of size (1 – e) r l [Gabow] finds in time O*(r 2 m/c) per group l » so overall time is c × MIT O*(r 2 m/c) = O*(r 2 m)

Las Vegas Algorithm Packing algorithm is Monte Carlo l Previously found approximate cut (faster) l If close, each “certifies” other l » Cut exceeds optimum cut » Packing below optimum cut If not, re-run both l Result: Las Vegas, expected time O*(r 2 m) l MIT

Exact Algorithm l Randomly partition edges in two groups » each like a 1/2 -sample: e = O*(c-1/2) l Recursively pack trees in each half » c/2 - O*(c 1/2) trees l Merge packings » gives packing of size c - O*(c 1/2) » augment to maximum packing: O*(mc 1/2) l MIT T(m, c)=2 T(m/2, c/2)+O*(mc 1/2) = O* (mc 1/2)

Nearly Linear Time MIT

Analyze Trees Recall: [G] packs c (directed)-edge disjoint spanning trees l Corollary: in such a packing, some tree crosses min-cut only twice l To find min-cut: l » find tree packing » find smallest cut with 2 tree edges crossing l MIT Problem: packing takes O*(mc) time

Constraint trees l Min-cut c: » c directed trees » 2 c directed min-cut edges » On average, two mincut edges/tree l MIT Definitions: tree 2 -crosses cut

Finding the Cut l l l MIT From crossing tree edges, deduce cut Remove tree edges No other edges cross So each component is on one side And opposite its “neighbor’s” side

Sampling l Solution: use G(r/c) with e=1/8 » pack O*(r) trees in O*(m) time » original min-cut has (1+e)r edges in G(r / c) » some tree 2 -crosses it in G(r / c) » …and thus 2 -crosses it in G l Analyze O*(r) trees in G » time O*(m) per tree » Monte Carlo MIT

Simplify Discuss case where one tree edge crosses min-cut MIT

Analyzing a tree Root tree, so cut subtree l Use dynamic program up from leaves to determine subtree cuts efficiently l Given cuts at children of a node, compute cut at parent l Definitions: l » v¯ are nodes below v » C(v¯) is value of cut at subtree v¯ MIT

The Dynamic Program u v w keep discard MIT Edges with least common ancestor u

Algorithm: 1 -Crossing Trees Compute edges’ LCA’s: O(m) l Compute “cuts” at leaves l » Cut values = degrees » each edge incident on at most two leaves » total time O(m) l Dynamic program upwards: O(n) Total: O(m+n) MIT

2 -Crossing Trees l Cut corresponds to two subtrees: v discard w keep n 2 table entries l fill in O(n 2) time with dynamic program l MIT

Linear Time Bottleneck is C(v¯, w¯) computations l Avoid. Find right “twin” w for each v l Compute using addpath and minpath operations of dynamic trees [ST] l Result: O(m log 3 n) time (messy) l MIT

How do we verify a minimum cut? MIT

Network Design Randomized Rounding MIT

Problem Statement Given vertices, and cost cvw to buy and edge from v to w, find minimum cost purchase that creates a graph with desired connectivity properties l Example: minimum cost k-connected graph. l Generally NP-hard l Recent approximation algorithms [GW], [JV] l MIT

Integer Linear Program Variable xvw=1 if buy edge vw l Solution cost S xvw cvw l Constraint: for every cut, S xvw ³ k l Relaxing integrality gives tractable LP l » Exponentially many cuts » But separation oracles exist (eg min-cut) l MIT What is integrality gap?

Randomized Rounding Given LP solution values xvw l Build graph where vw is present with probability xvw l Expected cost is at most opt: S xvw cvw l Expected number of edges crossing any cut satisfies constraint l If expected number large for every cut, sampling theorem applies l MIT

k-connected subgraph Fractional solution is k-connected l So every cut has (expected) k edges crossing in rounded solution l Sampling theorem says every cut has at least k-(k log n)1/2 edges l Close approximation for large k l Can often repair: e. g. , get k-connected subgraph at cost 1+((log n)/k)1/2 times min l MIT

Nonuniform Sampling Concentrate on the important things [Benczur-Karger, Karger-Levine] MIT

s-t Min-Cuts Recall: if G has min-cut c, then in G(r/c) all cuts approximate their expected values to within e. l Applications: l l MIT Min-cut in O*(mc) time [G] Approximate/exact in O*((m/c) c) =O*(m) s-t min-cut of value v in O*(mv) Approximate in O*(mv/c) time Trouble if c is small and v large.

The Problem Cut sampling relied on Chernoff bound l Chernoff bounds required that no one edge is a large fraction of the expectation of a cut it crosses l If sample rate ^1/c, each edge across a min-cut is too significant l But: if edge only crosses large cuts, then sample rate ^1/c is OK! MIT

Biased Sampling l Original sampling theorem weak when » large m » small c l But if m is large » then G has dense regions » where c must be large » where we can sample more sparsely MIT

Problem Old Time New Time Approx. s-t min-cut O*(mn) Approx. s-t max-flow O*(m 3/2 ) Flow of value v O*(mv) Approx. bisection O*(m 2) O*(n 2 / e 2) O*(mn 1/2 / e) O*(n 11/9 v) O*(n 2 / e 2) m Þ n /e 2 in weighted, undirected graphs MIT

Strong Components l Definition: A k-strong component is a maximal vertex-induced subgraph with min-cut k. 3 2 3 MIT 2

Nonuniform Sampling Definition: An edge is k-strong if its endpoints are in same k-component. l Stricter than k-connected endpoints. l Definition: The strong connectivity ce for edge e is the largest k for which e is k -strong. l Plan: sample dense regions lightly l MIT

Nonuniform Sampling Idea: if an edge is k-strong, then it is in a k-connected graph l So “safe” to sample with probability 1/k l Problem: if sample edges with different probabilities, E[cut value] gets messy l Solution: if sample e with probability pe, give it weight 1/pe l Then E[cut value]=original cut value l MIT

Compression Theorem Definition: Given compression probabilities pe, compressed graph G[pe] » includes edge e with probability pe and » gives it weight 1/pe if included Note E[G[pe]] = G Theorem: G[r / ce] l » approximates all cuts by e » has O (rn) edges MIT

Proof (approximation) l Basic idea: in a k-strong component, edges get sampled with prob. r / k » original sampling theorem works Problem: some edges may be in stronger components, sampled less l Induct up from strongest components: l » apply original sampling theorem inside » then “freeze” so don’t affect weaker parts MIT

Strength Lemma l Lemma: å 1/ce £ n » Consider connected component C of G » Suppose C has min-cut k » Then every edge e in C has ce ³ k » So k edges crossing C’s min-cut have å 1/ce £ å 1/k £ k (1/k ) = 1 » Delete these edges (“cost” 1) » Repeat n - 1 times: no more edges! MIT

Proof (edge count) l Edge e included with probability r / ce l So expected number is S r / ce We saw S 1/ce £ n l So expected number at most r n l MIT

Construction l To sample, must find edge strengths » can’t, but approximation suffices l Sparse certificates identify weak edges: » construct in linear time [NI] » contain all edges crossing cuts £ k » iterate until strong components emerge l Iterate for 2 i-strong edges, all i » tricks turn it strongly polynomial MIT

Certificate Algorithm l Repeat k times » Find a spanning forest » Delete it Each iteration deletes one edge from every cut (forest is spanning) l So at end, any edge crossing a cut of size £ k is deleted l [NI] merge all iterations in O(m) time l MIT

Flows l Uniform sampling led to flow algorithms » Randomly partition edges » Merge flows from each partition element l Compression problematic » Edge capacities changed » So flow path capacities distorted » Flow in compressed graph doesn’t fit in original graph MIT

Smoothing l If edge has strength ce, divide into br / ce edges of capacity ce /br » Creates br å 1/ce = brn edges Now each edge is only 1/br fraction of any cut of its strong component l So sampling a 1/b fraction works l So dividing into b groups works l Yields (1 -e) max-flow in time l MIT

Cleanup Approximate max-flow can be made exact by augmenting paths l Integrality problems l » Augmenting paths fast for small integer flow » But breakup by smoothing ruins integrality l Surmountable » Flows in dense and sparse parts separable l MIT Result: max-flow in O*(n 11/9 v) time

Proof By Picture Dense regions s MIT t

Compress Dense Regions s MIT t

Solve Sparse Flow s MIT t

Replace Dense Parts (keep flow in sparse bits) s MIT t

“Fill In” Dense Parts s MIT t

Conclusions MIT

Conclusion Randomization is a crucial tool for algorithm design l Often yields algorithms that are faster or simpler than traditional counterparts l In particular, gives significant improvements for core problems in graph algorithms l MIT

Randomized Methods l Random selection » l Monte Carlo simulation » l » MIT simulations estimate event likelihoods Random sampling » l if most candidate choices “good”, then a random choice is probably good generate a small random subproblem solve, extrapolate to whole problem Randomized Rounding for approximation

Random Selection When most choices good, do one at random l Recursive contraction algorithm for minimum cuts l » Extremely simple (also to implement) » Fast in theory and in practice [CGKLS] MIT

Monte Carlo To estimate event likelihood, run trials l Slow for very rare events l Bias samples to reveal rare event l FPRAS for network reliability l MIT

Random Sampling Generate representative subproblem l Use it to estimate solution to whole l » Gives approximate solution » May be quickly repaired to exact solution Bias sample toward “important” or “sensitive” parts of problem l New max-flow and min-cut algorithms l MIT

$Randomized Rounding Convert fractional to integral solutions l Get approximation algorithms for integer programs$

Randomized Rounding Convert fractional to integral solutions l Get approximation algorithms for integer programs l “Sampling” from a well designed sample space of feasible solutions l Good approximations for network design. l MIT

Generalization Our techniques work because undirected graph are matroids l All our results extend/are special cases l » Packing bases » Finding minimum “quotients” » Matroid optimization (MST) MIT

Directed Graphs? Directed graphs are not matroids l Directed graphs can have lots of minimum cuts l Sampling doesn’t appear to work l Residual graphs for flows are directed l » Precludes obvious recursive solutions to flow problems MIT

Open problems l Flow in O(nv) time (complete m Þ n) » Eliminate v dependence » Apply to weighted graphs with large flows » Flow in O(m) time? l Las Vegas algorithms » Finding good certificates l Detrministic algorithms » Deterministic construction of “samples” » Deterministically compress a graph MIT

Randomization in Graph Optimization Problems David Karger MIT http: //theory. lcs. mit. edu/~karger@mit. edu MIT