Largescale semidefinite programs and applications to polynomial optimization

  • Slides: 46
Download presentation
Large-scale semidefinite programs and applications to polynomial optimization and machine learning Georgina Hall Princeton,

Large-scale semidefinite programs and applications to polynomial optimization and machine learning Georgina Hall Princeton, ORFE 1

What is a semidefinite program? • Definition: Example: 2

What is a semidefinite program? • Definition: Example: 2

Why semidefinite programming? • Natural generalization of linear programming (LP) • Much more expressive

Why semidefinite programming? • Natural generalization of linear programming (LP) • Much more expressive than LP but still solvable in polynomial time • Powerful tool to approximately solve intractable problems Idea: Inner/outer approximation Feasible set of the initial problem Optimizing over the inner or outer approximation is an SDP 3

A few examples Combinatorial optimization Game theory and economics Machine learning and statistics Example:

A few examples Combinatorial optimization Game theory and economics Machine learning and statistics Example: How to schedule the maximum number of meetings with no conflicts? Example: How to compute a Nash equilibrium ? Example: How to choose (as a bank) the interest rate for a prospective house buyer? Meeting 1 Meeting 3 Meeting 4 Meeting 2 Meeting 5 • Nodes are the meetings • Edge between two nodes if at least one person has to attend both meetings Player Payoff A matrix for A Payoff matrix for B Player B • How to find the set of strategies of players A and B such that neither A or B have an incentive to change their strategy unilaterally? • Income • Number of claims of nonpayment • Years spent in current job, etc. • Regression with monotonicity constraints 4

A specific example in random networks (1/2) Example: How to recognize a fake news

A specific example in random networks (1/2) Example: How to recognize a fake news website from a legitimate news website? How to classify each website as legitimate vs fake from the graph only? Idea: members of the same community are more likely to be connected. 5

A specific example in random networks (2/2) 6

A specific example in random networks (2/2) 6

My work in a nutshell (1/2) Designing SDP-based approximations for intractable problems, and solving

My work in a nutshell (1/2) Designing SDP-based approximations for intractable problems, and solving SDPs faster. Semidefinite programming Efficiently solving optimization problems over nonnegative polynomials. (fundamental problem in computational algebra) Problem: Example: Applications: POPs, control, ML, etc. Opt. over nonnegative polynomials 7

My work in a nutshell (2/2) Large-scale semidefinite programs Theory of nonnegative polynomials Applications

My work in a nutshell (2/2) Large-scale semidefinite programs Theory of nonnegative polynomials Applications of optimizing over nonnegative polynomials Application of semidefinite programming This talk: Part I: Large-scale SDPs Part II. a: Monotone regression Part II. b: Difference of convex programming 8

Part I Large-scale semidefinite programs 9

Part I Large-scale semidefinite programs 9

Reminder: semidefinite programming • Faster ways of solving large-scale SDPs? 10

Reminder: semidefinite programming • Faster ways of solving large-scale SDPs? 10

LP and SOCP-based inner approximations Further inner approximate Inner approximate SDP LP • SOCP

LP and SOCP-based inner approximations Further inner approximate Inner approximate SDP LP • SOCP LP SOCP [Ahmadi, Majumdar] 11

LP and SOCP-based inner approximations Hard problem Semidefinite program Linear or second order cone

LP and SOCP-based inner approximations Hard problem Semidefinite program Linear or second order cone program How to bridge this gap? Idea: Iteratively construct a sequence of improving LP/SOCP-based inner approximations Initialization: Start with dd or sdd Method 1: Basis Pursuit Method 2: Column Generation [Ahmadi, H. ] [Ahmadi, Dash, H. ] 12

Method 1: Basis Pursuit SDP we want to solve Idea: Try and find such

Method 1: Basis Pursuit SDP we want to solve Idea: Try and find such a basis 13

Method 1: Basis Pursuit SDP to solve Step 1 Initialization Step 2 Step 3

Method 1: Basis Pursuit SDP to solve Step 1 Initialization Step 2 Step 3 14

Method 1: Basis Pursuit LP SOCP 15

Method 1: Basis Pursuit LP SOCP 15

Method 2: Column Generation DD cone SDD cone Facet description Extreme ray description +

Method 2: Column Generation DD cone SDD cone Facet description Extreme ray description + + + 16

Method 2: Column Generation • Initial SDP Find a new “atom” Take the dual

Method 2: Column Generation • Initial SDP Find a new “atom” Take the dual 17

 Method 2: Column Generation 18

Method 2: Column Generation 18

Objective value Method 2: Column Generation The SDPs are generated randomly from a relaxation

Objective value Method 2: Column Generation The SDPs are generated randomly from a relaxation of sparse PCA¹ Number of iterations SOLVE TIME SOLUTION QUALITY Size of the SDP Average solver time for DD+5 CG (s) Average solver time for SDP (s) in MOSEK Ratio of solver times Average objective value for DD+5 CG Average objective value for SDP Relative error 150 5. 42 178. 97 3. 03% 188. 09 187. 10 0. 53% 200 11. 36 965. 28 1. 18% 243. 26 241. 99 0. 52% 250 21. 40 3817. 17 0. 56% 306. 68 305. 92 0. 25% ¹ See, e. g. , A direct formulation for sparse PCA using semidefinite programming by A. d’Aspremont, L. E. Ghaoui, M. I. Jordan, and G. R. Lanckriet 19

Part II. a Application of optimizing over nonnegative polynomials: Monotone regression [Ahmadi, Curmei, H.

Part II. a Application of optimizing over nonnegative polynomials: Monotone regression [Ahmadi, Curmei, H. ] 20

Optimizing over nonnegative polynomials Intractable constraint How to optimize over this set? First, how

Optimizing over nonnegative polynomials Intractable constraint How to optimize over this set? First, how to test membership to this set? 21

Nonnegative and sum of squares polynomials (1/2) • 22

Nonnegative and sum of squares polynomials (1/2) • 22

Nonnegative and sum of squares polynomials (2/2) Optimizing over nonnegative polynomials Inner approx. Semidefinite

Nonnegative and sum of squares polynomials (2/2) Optimizing over nonnegative polynomials Inner approx. Semidefinite program 22

Monotone regression Can this be done computationally? 23

Monotone regression Can this be done computationally? 23

Intractability • Monotone regression problem formulation Sum of squares inner approximation (SDP) Fit to

Intractability • Monotone regression problem formulation Sum of squares inner approximation (SDP) Fit to the data Monotonicity constraints 24

Approximation theorem • Proof uses results from approximation theory and Putinar’s Positivstellensatz 25

Approximation theorem • Proof uses results from approximation theory and Putinar’s Positivstellensatz 25

 Numerical experiments (1/2) Original function, monotonic, and unconstrained regressors projected onto one variable

Numerical experiments (1/2) Original function, monotonic, and unconstrained regressors projected onto one variable Comparison of the MSE for testing and training sets in unconstrained and monotonically constrained settings 26

Numerical examples (2/2) Example: How to choose (as a bank) the interest rate for

Numerical examples (2/2) Example: How to choose (as a bank) the interest rate for a prospective house buyer? • 6 features and 3707 observations. Data source: Lending Club. • Response variable is the interest rate, either increasing or decreasing wrt each feature. • We compute average RMSE on 10 -fold cross validation. Least squares Monotonically constrained Training set Testing set 28

Part II. b Application of optimizing over nonnegative polynomials: difference of convex programming [Ahmadi,

Part II. b Application of optimizing over nonnegative polynomials: difference of convex programming [Ahmadi, H. ] Winner of the 2016 INFORMS Computing Society Best Student Paper Award 29

Difference of convex programming (1/2) • What if such a decomposition is not given

Difference of convex programming (1/2) • What if such a decomposition is not given to us? Hiriart-Urruty, 1985 Tuy, 1995 30

Difference of convex decomposition • DC decomposition problem formulation Questions: • Is this problem

Difference of convex decomposition • DC decomposition problem formulation Questions: • Is this problem always feasible? i. e. , can you always write any polynomial as the difference of two convex polynomials? • Can it be solved computationally? • Does it have a unique solution? 31

Convex and sos-convex polynomials Optimizing over convex polynomials Inner approx. Semidefinite program • Note

Convex and sos-convex polynomials Optimizing over convex polynomials Inner approx. Semidefinite program • Note that: sos-convexity 32

Back to an efficient dc decomposition DC decomposition problem formulation Semidefinite program Inner approx.

Back to an efficient dc decomposition DC decomposition problem formulation Semidefinite program Inner approx. Are these problems always feasible? Theorem [Ahmadi, H. ]: Any polynomial can be written as the difference of two sos-convex polynomials (and hence as the difference of two convex polynomials). 33

Existence of dc decomposition (1/2) • K E 34

Existence of dc decomposition (1/2) • K E 34

Existence of dc decomposition (2/2) • Questions: • Is this problem always feasible? •

Existence of dc decomposition (2/2) • Questions: • Is this problem always feasible? • Can it be solved computationally? • Is the solution unique? DC decomposition problem formulation Yes No Initial decomposition Alternative decompositions “Best decomposition? ” 35

Convex-Concave Procedure (CCP) • Heuristic for minimizing DC programming problems. • Idea: Input Solve

Convex-Concave Procedure (CCP) • Heuristic for minimizing DC programming problems. • Idea: Input Solve convex subproblem convex affine 36

Convex-Concave Procedure (CCP) • Reiterate 37

Convex-Concave Procedure (CCP) • Reiterate 37

Picking the “best” decomposition for CCP Algorithm Idea Mathematical translation 38

Picking the “best” decomposition for CCP Algorithm Idea Mathematical translation 38

Comparing different decompositions (1/2) • Feasibility “Best” decomposition 39

Comparing different decompositions (1/2) • Feasibility “Best” decomposition 39

Comparing different decompositions (2/2) Feasibility • Average over 30 instances • Solver: Mosek •

Comparing different decompositions (2/2) Feasibility • Average over 30 instances • Solver: Mosek • Computer: 8 Gb RAM, 2. 40 GHz processor “Best” decomp. Conclusion: Performance of CCP strongly affected by initial decomposition. 40

Main messages (1/2) • Semidefinite programming (SDP) can provide very powerful bounds for many

Main messages (1/2) • Semidefinite programming (SDP) can provide very powerful bounds for many intractable problems. or Meeting 1 Meeting 2 Meeting 3 Meeting 4 • Large-scale SDPs are expensive to solve. • We proposed two methods to solve large-scale SDPs that are based on generating adaptive sequences of LPs and SOCPs. Method 1: Basis Pursuit Method 2: Column Generation 41

Main messages (2/2) • Optimizing over nonnegative polynomials has many applications. Focused on monotone

Main messages (2/2) • Optimizing over nonnegative polynomials has many applications. Focused on monotone regression and difference of convex programming. • Gave an SDP machinery for optimizing over monotone polynomials with approximation results. • Showed that any polynomial can be decomposed into the difference of two convex polynomials and that the decomposition impacts the performance of CCP. 42

 • Semidefinite programming relaxations of problems on random graphs • Random polynomial optimization

• Semidefinite programming relaxations of problems on random graphs • Random polynomial optimization problems Theory • Use techniques from machine learning, polynomial optimization, and control to model and better understand mental health and addiction Applications • Make large-scale SDP a powerful and widely-used technology • Using ideas from Integer Programming • Developing a commercial-level solver Computation Where to from here? 43

Thank you for listening Questions? Want to know more? http: //scholar. princeton. edu/ghall/ 44

Thank you for listening Questions? Want to know more? http: //scholar. princeton. edu/ghall/ 44

Difference of convex programming (1/2) • Regression Sparsity Convex Concave 45

Difference of convex programming (1/2) • Regression Sparsity Convex Concave 45

Method 2: Column Generation Experiment 1 Solver time (s) of DD+6 CG 6. 885

Method 2: Column Generation Experiment 1 Solver time (s) of DD+6 CG 6. 885 Solver time (s) of SDP 167. 466 1. 09% 2 7. 092 169. 411 3 7. 343 154. 584 4 6. 815 176. 018 5 7. 509 160. 461 6 7. 413 169. 549 0. 11% 0. 19% 0. 05% 0. 07% 0. 22% 46