Optimization Convex Relaxations M Pawan Kumar Slides available

Optimization Convex Relaxations M. Pawan Kumar Slides available online http: //mpawankumar. info

Energy Function Label l 1 Label l 0 Va Vb Vc Random Variables V = {Va, Vb, …. } Labels L = {l 0, l 1, …. } Labelling f: {a, b, …. } �{0, 1, …} Vd

Energy Function Label l 1 Label l 0 2 4 6 3 5 2 Vb 3 Vc 7 Va Q(f) = ∑a a(f(a)) Unary Potential Vd Easy to minimize Neighbourhood

Energy Function Label l 1 Label l 0 2 4 6 3 5 2 Vb 3 Vc 7 Va Vd E : (a, b) E iff Va and Vb are neighbours E = { (a, b) , (b, c) , (c, d) }

Energy Function Label l 1 2 0 1 Label l 0 5 Va 1 0 0 4 2 Vb 2 6 3 1 3 Vc 1 4 3 1 0 7 Vd Pairwise Potential Q(f) = ∑a a(f(a))+∑(a, b) ab(f(a), f(b))

Energy Minimization 2 0 4 1 5 3 0 V 1 2 V 2 minf ∑a a(f(a)) +∑(a, b) ab(f(a), f(b))

Integer Program 2 0 4 1 5 3 0 V 1 2 V 2 minf ∑a a(f(a)) +∑(a, b) ab(f(a), f(b)) xa(i) ∈ {0, 1} Does Va take the label li (xa(i) = 1) or not (xa(i) = 0)?

Constraint 2 0 4 1 5 3 0 V 1 2 V 2 minf ∑a a(f(a)) +∑(a, b) ab(f(a), f(b)) xa(i) ∈ {0, 1} Constraint that Va can take exactly one label

Constraint 2 0 4 1 5 3 0 V 1 2 V 2 minf ∑a a(f(a)) xa(i) ∈ {0, 1} ∑i xa(i) = 1 +∑(a, b) ab(f(a), f(b))

Unary Potentials 2 0 4 1 5 ∑a ∑i a(i)xa(i) 3 0 V 1 2 V 2 minf ∑a a(f(a)) xa(i) ∈ {0, 1} ∑i xa(i) = 1 +∑(a, b) ab(f(a), f(b))

Pairwise Potentials 2 0 4 1 5 ∑(a, b) ∑i, k ab(i, k)xa(i)xb(k) 3 0 V 1 2 V 2 minf ∑a a(f(a)) xa(i) ∈ {0, 1} ∑i xa(i) = 1 +∑(a, b) ab(f(a), f(b))

Integer Program 2 0 4 1 5 3 0 V 1 minx 2 V 2 ∑a ∑i a(i)xa(i) + ∑(a, b) ∑i, k ab(i, k)xa(i)xb(k) s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1

Outline • QP Relaxation • LP Relaxation for Potts Model • LP Relaxation for Pairwise Energy • A Hierarchy of Relaxations Ravikumar and Lafferty, 2006

Unary Potential Vector 2 0 1 5 For x, total unary cost? 3 0 V 1 4 u Tx 2 V 2 Unary Potential u = [ 5 2; 2 Cost of. Cost V 1 =of 0 V 1 = 1 4 ]

Pairwise Potential Matrix 2 0 1 5 4 For x, total Pairwise cost? 3 0 V 1 ½ x. TPx 2 V 2 Pairwise Potential Matrix P 0 0 0 3 0 0 0 1 3 0 1 0 0 0 Cost of V 1 = 0 and V 1 = 0 Cost of V 1 = 0 and V 2 = 1

Integer Program 2 0 4 1 5 3 0 V 1 minx 2 V 2 u Tx s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 + ½ x. TPx Convex? No. Diagonal of P is 0

Integer Program 2 0 4 1 Consider a vector d 3 Define D = diag(d) 5 0 V 1 minx 2 V 2 u Tx s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 + ½ x. TPx

Integer Program 2 0 4 1 Consider a vector d 3 Define D = diag(d) 5 0 V 1 minx 2 V 2 u Tx s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 + ½ x. T(P+D)x

Integer Program 2 0 4 1 Consider a vector d 3 Define D = diag(d) 5 0 V 1 minx 2 V 2 (u-d)Tx + ½ x. T(P+D)x s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 Equivalent to the old problem Why? xa(i)*xa(i) = xa(i)

Integer Program 2 0 4 1 5 3 0 V 1 minx Choose an appropriate d 2 V 2 d(i) = Sum of absolute values of the i-th row of P (u-d)Tx + ½ x. T(P+D)x Convex s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 Why? Because P+D ≽ 0

QP Relaxation 2 0 4 1 5 3 0 V 1 minx Choose an appropriate d 2 V 2 d(i) = Sum of absolute values of the i-th row of P (u-d)Tx + ½ x. T(P+D)x s. t. xa(i) ∈ [0, 1] ∑i xa(i) = 1 Solver? Conditional Gradient (Frank-Wolfe)

Outline • QP Relaxation – Conditional Gradient • LP Relaxation for Potts Model • LP Relaxation for Pairwise Energy • A Hierarchy of Relaxations Frank and Wolfe, 1956

Conditional Gradient minx f(x) s. t. x∈X Objective f(x) is assumed smooth Gradients defined everywhere Feasible region is convex and bounded

Conditional Gradient minx f(x) s. t. x∈X Compute gradient g of f(x) at current xt Compute conditional gradient

Conditional Gradient ct = argminx s. t. g Tx x∈X Update xt+1 = ηtxt + (1 -ηt)ct xt+1 ∈ X No need for projection Why?

CG for QP Initialize x 0 and t = 0 While objective can be reduced g = (u-d) + (P+D)xt Easy Why? ct = argminx g. Tx s. t. xa(i) ∈ [0, 1] ∑i xa(i) = 1 Update xt+1 = ηtxt + (1 -ηt)ct t=t+1 We can compute optimal ηt Why?

Outline • QP Relaxation • LP Relaxation for Potts Model • LP Relaxation for Pairwise Energy • A Hierarchy of Relaxations Kleinberg and Tardos, 1999

Integer Program 2 0 4 1 5 3 0 V 1 minx ab(i, k) = wab, if i ≠ k = 0, if i = k 2 V 2 ∑a ∑i a(i)xa(i) + s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 ∑(a, b) ∑i, k ab(i, k)xa(i)xb(k)

Integer Program 2 0 4 1 5 3 0 V 1 minx ab(i, k) = wab, if i ≠ k = 0, if i = k 2 V 2 ∑a ∑i a(i)xa(i) + ½ ∑(a, b) ∑i wab |xa(i)-xb(i)| s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1

LP Relaxation 2 0 4 1 5 3 0 V 1 minx ab(i, k) = wab, if i ≠ k = 0, if i = k 2 V 2 ∑a ∑i a(i)xa(i) + ½ ∑(a, b) ∑i wab |xa(i)-xb(i)| s. t. xa(i) ∈ [0, 1] ∑i xa(i) = 1 For 2 labels, min-cut problem Integer optimal solutions

Outline • QP Relaxation • LP Relaxation for Potts Model • LP Relaxation for Pairwise Energy • A Hierarchy of Relaxations Chekuri et al. , 2001

Integer Program 2 0 4 1 5 3 0 V 1 minx 2 V 2 ∑a ∑i a(i)xa(i) + s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 ∑(a, b) ∑i, k ab(i, k)xa(i)xb(k)

Integer Program 2 0 4 1 5 3 0 V 1 minx 2 V 2 ∑a ∑i a(i)xa(i) + s. t. xa(i) ∈ {0, 1} ∑i xa(i) = 1 ∑(a, b) ∑i, k ab(i, k)xab(i, k) ∈ {0, 1} ∑k xab(i, k) = xa(i)

LP Relaxation 2 0 4 1 5 3 0 V 1 minx UGC Hardness Guarantees 2 V 2 ∑a ∑i a(i)xa(i) + s. t. xa(i) ∈ [0, 1] ∑i xa(i) = 1 ∑(a, b) ∑i, k ab(i, k)xab(i, k) ∈ [0, 1] ∑k xab(i, k) = xa(i) Marginalization constraint

Outline • QP Relaxation • LP Relaxation for Potts Model • LP Relaxation for Pairwise Energy • A Hierarchy of Relaxations Sherali and Adams, 1990

LP Relaxation minx ∑a ∑i a(i)xa(i) + + ∑(a, b) ∑i, k ab(i, k)xab(i, k) ∑(a, b, c) ∑i, k, m abc(i, k, m)xab(i, k, m) s. t. xa(i) ∈ [0, 1] ∑i xa(i) = 1 xab(i, k) ∈ [0, 1] xabc(i, k, m) ∈ [0, 1] ∑k xab(i, k) = xa(i) ∑k, m xabc(i, k, m) = xa(i) ∑m xabc(i, k, m) = xab(i, k)

LP Relaxation minx ∑a ∑i a(i)xa(i) + s. t. xa(i) ∈ [0, 1] ∑i xa(i) = 1 ∑(a, b) ∑i, k ab(i, k)xab(i, k) ∈ [0, 1] xabc(i, k, m) ∈ [0, 1] ∑k xab(i, k) = xa(i) ∑k, m xabc(i, k, m) = xa(i) ∑m xabc(i, k, m) = xab(i, k)

LP Relaxation Hierarchy Higher and higher orders of marginalizations Eventually you will find a tight relaxation But it may be exponential in size Exponential blow-up very rare in practice Real-world is full of structure

Questions?