Discrepancy and SDPs Nikhil Bansal TU Eindhoven Outline

Discrepancy and SDPs Nikhil Bansal (TU Eindhoven)

Outline Discrepancy: definitions and applications Basic results: upper/lower bounds Partial Coloring method (non-constructive) SDPs: basic method Algorithmic six std. deviations Lovett-Meka result Lower bounds via SDP duality (Matousek)

Material Classic: Geometric Discrepancy by J. Matousek Papers: • • Bansal. Constructive algs. for disc. minimization, FOCS 2010 Matousek. The determinant lower bound is almost tight. Arxiv’ 11 Lovett, Meka. Disc. minimization by walking on edges. Arxiv’ 12 Other related recent works. Survey (main ideas): Bansal. Semidefinite opt. and disc. theory.

Discrepancy: What is it? Study of gaps in approximating the continuous by the discrete. Original motivation: Numerical Integration/ Sampling How well can you approximate a region by discrete points ? 0 n Discrepancy: Max over intervals I |(# points in I) – (length of I)|

Discrepancy: What is it? Problem: How uniformly can you distribute points in a grid. “Uniform” : For every axis-parallel rectangle R | (# points in R) - (Area of R) | should be low. Discrepancy: Max over rectangles R |(# points in R) – (Area of R)| n 1/2

Distributing points in a grid Problem: How uniformly can you distribute points in a grid. “Uniform” : For every axis-parallel rectangle R | (# points in R) - (Area of R) | should be low. n= 64 points Uniform n 1/2 discrepancy Random n 1/2 (loglog n)1/2 Van der Corput Set O(log n) discrepancy!

Quasi-Monte Carlo Methods •

Discrepancy: Example 2 Input: n points placed arbitrarily in a grid. Color them red/blue such that each rectangle is colored as evenly as possible Discrepancy: max over rect. R ( | # red in R - # blue in R | ) Continuous: Color each element 1/2 red and 1/2 blue (0 discrepancy) Discrete: Random has about O(n 1/2 log 1/2 n) Can achieve O(log 2. 5 n)

Discrepancy: Example 2 Input: n points placed arbitrarily in a grid. Color them red/blue such that each rectangle is colored as evenly as possible Discrepancy: max over rect. R ( | # red in R - # blue in R | ) Discrete: Can achieve O(log 2. 5 n) Exercise: O(log 4 n) Optional: O(log 2. 5 n) Why do we care?

Combinatorial Discrepancy S 3 • S 1 S 2 S 4

Applications CS: Computational Geometry, Comb. Optimization, Monte-Carlo simulation, Machine learning, Complexity, Pseudo-Randomness, … Math: Dynamical Systems, Combinatorics, Mathematical Finance, Number Theory, Ramsey Theory, Algebra, Measure Theory, …

Hereditary Discrepancy a useful measure of complexity of set system 1 2 … n But not so robust S 1 S 2 … 1’ 2’ … n’ S’ 1 S’ 2 … Hereditary discrepancy: herdisc (U, S) = max. U’ ½ U disc (U’, S|U’) Robust version of discrepancy (usually same as discrepancy)

Rounding •

More rounding approaches •

Dynamic Data Structures N points in a 2 -d region. weights updated over time. Query: Given an axis-parallel rectangle R, determine the total weight on points in R. Preprocess: 1) Low query time 2) Low update time (upon weight change)

Example •

What about other queries? •

Idea Any data structure is maintaining D points D Query w A good data structure implicitly computes: D = A P Aggregator A P Precompute A = row sparse P = Column sparse (low query time) (low update time)

Outline Discrepancy: definitions and applications Basic results: upper/lower bounds Partial Coloring method (non-constructive) SDPs: basic method Algorithmic six std. deviations Lovett-Meka result Lower bounds via SDP duality (Matousek)

Basic Results What is the discrepancy of a general system on m sets?

Best Known Algorithm Random: Color each element i independently as x(i) = +1 or -1 with probability ½ each. Thm: Discrepancy = O (n log m)1/2 Pf: For each set, expect O(n 1/2) discrepancy 2 1/2 -c Standard tail bounds: Pr[ | i 2 S x(i) | ¸ c n ] ¼ e Union bound + Choose c ¼ (log m)1/2 Analysis tight: Random actually incurs ((n log m) ). Henceforth, focus on m=n case.

Better Colorings Exist! •

Better Colorings Exist! [Spencer 85]: (Six standard deviations suffice) Always exists coloring with discrepancy · 6 n 1/2 (In general for arbitrary m, discrepancy = O(n 1/2 log(m/n)1/2) Inherently non-constructive proof (pigeonhole principle on exponentially large universe) Challenge: Can we find it algorithmically ? Certain algorithms do not work [Spencer] Conjecture [Alon-Spencer]: May not be possible.

Beck Fiala Thm U = [1, …, n] Sets: S 1, S 2, …, Sm S 3 S 1 S 4 Suppose each element lies in at most t sets (t << n). S 2 [Beck Fiala’ 81]: Discrepancy 2 t -1. (elegant linear algebraic argument, algorithmic result) (note: random does not work) Beck Fiala Conjecture: O(t 1/2) discrepancy possible Other results: O( t 1/2 log t log n ) [Beck] O( t 1/2 log n ) [Srinivasan] Non-constructive O( t 1/2 log 1/2 n ) [Banaszczyk]

Approximating Discrepancy Question: If a set system has low discrepancy (say << n 1/2) Can we find a good discrepancy coloring ? [Charikar, Newman, Nikolov 11]: Even 0 vs. O (n 1/2) is NP-Hard 1 2 … n S 1 S 2 … 1’ 2’ … n’ S’ 1 S’ 2 … (Matousek): What if system has low Hereditary discrepancy? herdisc (U, S) = max. U’ ½ U disc (U’, S|U’) Useful for the rounding application.

Two Results • Other Problems: Constructive bounds (matching current best) k-permutation problem [Spencer, Srinivasan, Tetali] geometric problems , Beck Fiala setting (Srinivasan’s bound) …

Relaxations: LPs and SDPs Not clear how to use. Linear Program is useless. Can color each element ½ red and ½ blue. Discrepancy of each set = 0! SDPs (LP on vi ¢ vj, cannot control dimension of v’s) | i 2 S vi |2 · n 8 S |vi|2 = 1 Intended solution vi = (+1, 0, …, 0) or (-1, 0, …, 0). Trivially feasible: vi = ei (all vi’s orthogonal) Yet, SDPs will be a major tool.

Partial Coloring Method

A Question • -n n

An Improvement •

Algorithmically ? •

Yet another enhancement • s’ = (1, -1, 1 , …, 1, -1) s’’ = (-1, -1, …, 1, 1, 1)

Proof of Claim •

Spencer’s proof •

Spencer’s O(n 1/2) result Partial Coloring Lemma: For any system with m sets, there exists a coloring on ¸ n/2 elements with discrepancy O(n 1/2 log 1/2 (2 m/n)) [For m=n, disc = O(n 1/2)] Algorithm for total coloring: Repeatedly apply partial coloring lemma Total discrepancy O( n 1/2 log 1/2 2 ) [Phase 1] + O( (n/2)1/2 log 1/2 4 ) [Phase 2] + O((n/4)1/2 log 1/2 8 ) [Phase 3] + … = O(n 1/2) Let us prove the lemma for m = n

Proving Partial Coloring Lemma Call two colorings X 1 and X 2 “similar” for set S if |X 1(S) – X 2(S) | · 20 n 1/2 Key Lemma: There exist k=24 n/5 colorings X 1, …, Xk such that every two Xi, Xj are similar for every set S 1, …, Sn. Some X 1, X 2 differ on ¸ n/2 positions Consider X = (X 1 – X 2)/2 X 1 = ( 1, -1, 1 , …, 1, -1) X 2 = (-1, -1, …, 1, 1, 1) X = ( 1, 0, 1 , …, 0, -1) Pf: X(S) = (X 1(S) – X 2(S))/2 2 [-10 n 1/2 , 10 n 1/2]

Proving Partial Coloring Lemma -30 n 1/2 -2 • -10 n 1/2 -1 10 n 1/2 0 30 n 1/2 1 2

Entropy •

Proving Partial Coloring Lemma -30 n 1/2 -2 -10 n 1/2 -1 10 n 1/2 0 30 n 1/2 1 2 Pf: Associate with coloring X, signature = (b 1, b 2, …, bn) (bi = bucket in which X(Si) lies ) Wish to show: There exist 24 n/5 colorings with same signature Choose X randomly: Induces distribution on signatures. Entropy ( ) · n/5 implies some signature has prob. ¸ 2 -n/5. Entropy ( ) · i Entropy( bi) [Subadditivity of Entropy] bi = 0 w. p. ¼ 1 - 2 e-50, = 1 w. p. ¼ e-50 = 2 w. p. ¼ e-450 Ent(b 1) · 1/5

A useful generalization Partial coloring with non-uniform discrepancy S for set S For each set S, consider the “bucketing” -2 -1 -3 S 0 - S 2 1 S 3 S 5 S Suffices to have s Ent (bs) · n/5 Or, if S = s |S|1/2 , then s g( s) · n/5 2 g( ) ¼ e- /2 > 1 ¼ ln(1/ ) < 1 Bucket of n 1/2/100 has penalty ¼ ln(100)

General Partial Coloring • s g( S/|S|1/2) · n/5 2 g( ) ¼ e- /2 > 1 ¼ ln(1/ ) < 1

Recap • Only if we could find the partial coloring efficiently…

PART 2

Outline Discrepancy: definitions and applications Basic results: upper/lower bounds Partial Coloring method (non-constructive) SDPs: basic method Algorithmic six std. deviations Lovett-Meka result Lower bounds via SDP duality (Matousek)

Algorithms •

Algorithm (at high level) Cube: {-1, +1}n start Each dimension: An Element Each vertex: A Coloring finish Algorithm: “Sticky” random walk Each step generated by rounding a suitable SDP Move in various dimensions correlated, e. g. t 1 + t 2 ¼ 0 Analysis: Few steps to reach a vertex (walk has high variance) Disc(Si) does a random walk (with low variance)

An SDP Hereditary disc. ) the following SDP is feasible SDP: Low discrepancy: | i 2 Sj vi |2 · 2 |vi|2 = 1 Rounding: Pick random Gaussian g = (g 1, g 2, …, gn) each coordinate gi is iid N(0, 1) For each i, consider i = g ¢ vi Obtain vi 2 Rn

Properties of Gaussian Lemma: If g 2 Rn is random Gaussian. For any v 2 Rn, g ¢ v is distributed as N(0, |v|2) Pf: N(0, a 2) + N(0, b 2) = N(0, a 2+b 2) g¢ v = i v(i) gi » N(0, i v(i)2) Lemma: Gaussian is rotationally invariant.

Properties of Rounding Lemma: If g 2 Rn is random Gaussian. For any v 2 Rn, g ¢ v is distributed as N(0, |v|2) Pf: N(0, a 2) + N(0, b 2) = N(0, a 2+b 2) g¢ v = i v(i) gi » N(0, i v(i)2) Recall: i = g ¢ vi 1. Each i » N(0, 1) 2. For each set S, i 2 S i = g ¢ ( i 2 S vi) » N(0, · 2) (std deviation · ) SDP: |vi|2 = 1 | i 2 S vi|2 · 2 ’s mimics a low discrepancy coloring (but is not {-1, +1})

Algorithm Overview Construct coloring iteratively. Initially: Start with coloring x 0 = (0, 0, 0, …, 0) at t = 0. At Time t: Update coloring as xt = xt-1 + ( t 1, …, tn) ( tiny: 1/n suffices) +1 x(i) -1 time xt(i) = ( 1 i + 2 i + … + ti) Color of element i: Does random walk over time with step size ¼ N(0, 1) Fixed if reaches -1 or +1. Set S: xt(S) = i 2 S xt(i) does a random walk w/ step N(0, · 2)

Analysis Consider time T = O(1/ 2) Claim 1: With prob. ½, an element reaches -1 or +1. Pf: Each element doing random walk with size ¼ . Recall: Random walk with step 1, is ¼ O(t 1/2) away in t steps. Claim 2: Each set has O( ) discrepancy in expectation. Pf: For each S, xt(S) doing random walk with step size ¼ .

Recap At each step of walk, formulate SDP on unfixed variables. Use some (existential) property to argue SDP is feasible Rounding SDP solution -> Step of walk Properties of walk: High Variance -> Quick convergence Low variance for discrepancy on sets -> Low discrepancy

Refinements Spencer’s six std deviations result: Goal: Obtain O(n 1/2) discrepancy for any set system on m = O(n) sets. Random coloring has n 1/2 (log n)1/2 discrepancy Previous approach seems useless: Expected discrepancy for a set O(n 1/2), but some random walks will deviate by up to (log n)1/2 factor Need an additional idea to prevent this.

Spencer’s O(n 1/2) result Partial Coloring Lemma: For any system with m sets, there exists a coloring on ¸ n/2 elements with discrepancy O(n 1/2 log 1/2 (2 m/n)) [For m=n, disc = O(n 1/2)] Algorithm for total coloring: Repeatedly apply partial coloring lemma Total discrepancy O( n 1/2 log 1/2 2 ) [Phase 1] + O( (n/2)1/2 log 1/2 4 ) [Phase 2] + O((n/4)1/2 log 1/2 8 ) [Phase 3] + … = O(n 1/2)

Algorithm (at high level) Cube: {-1, +1}n start Each dimension: An Element Each vertex: A Coloring finish Algorithm: “Sticky” random walk Each step generated by rounding a suitable SDP Move in various dimensions correlated, e. g. t 1 + t 2 ¼ 0 Analysis: Few steps to reach a vertex (walk has high variance) Disc(Si) does a random walk (with low variance)

An SDP Suppose there exists partial coloring X: 1. On ¸ n/2 elements 2. Each set S has |X(S)| · S SDP: Low discrepancy: | i 2 Sj vi |2 · S 2 Many colors: i |vi|2 ¸ n/2 |vi|2 · 1 Pick random Gaussian g = (g 1, g 2, …, gn) each coordinate gi is iid N(0, 1) For each i, consider i = g ¢ vi Obtain vi 2 Rn

Properties of Rounding Lemma: If g 2 Rn is random Gaussian. For any v 2 Rn, g ¢ v is distributed as N(0, |v|2) Recall: i = g ¢ vi 1. Each i » N(0, i 2) where i 2 · 1 2. Total variance is large i i 2 ¸ n/2 3. For each set S, i 2 S i » N(0, · S 2) (std deviation · S) SDP: |vi|2 · 1 i |vi|2 ¸ n/2 | i 2 S vi|2 · S 2 ’s sort of like a partial coloring, but not quite!

Algorithm Overview Construct coloring iteratively. Initially: Start with coloring x 0 = (0, 0, 0, …, 0) at t = 0. At Time t: Update coloring as xt = xt-1 + ( t 1, …, tn) ( tiny: say 1/n) +1 x(i) -1 1 + 2 + … + t ) x (i) = ( t i i i Color of i does a random walk over time (martingale) with step size ¼ Color of i is fixed if reaches -1 or +1. xt(S) = i 2 S xt(i) also does a random walk over time (step ¼ S)

Algorithm idea Fact: Random walk with step size 1, is ¼ O(t 1/2) away in t steps. Consider time t = O(1/ 2) Claim 1: High total variance, i i 2¸ n/2 ) (n) variables reach -1 or +1. Pf: Each element doing random with size ¼ . Correlated walks + aggregate bound, but use “energy increment” + “Markov”. Claim 2: Low std deviation, · S for S ) each it has about S discrepancy in expectation. Pf: For each S, xt(S) doing random walk with step size S Same as partial coloring!! Back to square one?

Proving Progress •

Algorithm idea Fact: Random walk with step size 1, is ¼ O(t 1/2) away in t steps. Consider time t = O(1/ 2) Claim 1: High total variance, i i 2¸ n/2 ) (n) variables reach -1 or +1. Pf: Each element doing random with size ¼ . Correlated walks + aggregate bound, but use “energy increment” + “Markov”. Claim 2: Low std deviation, · S for S ) each it has about S discrepancy in expectation. Pf: For each S, xt(S) doing random walk with step size S

Second Idea Use flexibility in choosing S in the entropy condition. Entropy Condition: S g( S/n 1/2 ) · n/5 Initially S ¼ n 1/2 If some set S starts to get discrepancy way more than n 1/2, can reduce S = n 1/2 for <1. Key Point: Barely incur any penalty For < 1, g( ) ¼ ln (1/ ) Can set S = 1 for O(1/log n) fraction of sets!

1/2 Proof: O( (n log log n) ) Bound Each element’s color doing a random walk with step size about . Run for O(1/ 2) steps. Initially: Set S = 5 n 1/2 Each set’s discrepancy does random walk with step size ¼ 5 n 1/2 Call set S dangerous if discrepancy > 10 n 1/2 (log log n)1/2. For dangerous set S, set S = n 1/2/ log n (very unlikely now to exceed n 1/2 discrepancy in 1/ 2 steps). Chance a set is ever dangerous · exp( -2 log log n) < 1/(log n)2. Entropy for dangerous set g (1/log n) ¼ log n. So, total entropy increase · ( log n ) ¢ (n / (log n)2) << n.

Algorithm Initially write SDP with S = c n 1/2 Each set S does random walk and expects to reach discrepancy of O( S) = O(n 1/2) Some sets will become problematic. Reduce their S on the fly. Not many problematic sets, and entropy penalty low. Danger 1 0 20 n 1/2 Danger 3 … 30 n 1/2 35 n 1/2 …

Remarks Construct coloring over time by solving sequence of SDPs (guided by existence results) Works quite generally Can be derandomized [Bansal-Spencer] (use entropy method itself for derandomizing + usual tech. ) E. g. Deterministic six standard deviations can be viewed as a way to derandomize something stronger than Chernoff bounds.

Lovett Meka’ 12 Our algorithm still uses partial coloring method. Did not give a new proof of Spencer’s result. Is there a purely constructive proof ? Lovett Meka’ 12: Yes. Gaussian random walks + Linear Algebra

The new algorithm •

Analysis •

Matousek Lower Bound •

Detlb •

Proof Sketch •

Hoffman’s example •

Matousek’s result •

Recap: LP Duality •

SDP Duality •

Proof Sketch • U

Conclusions •