How to Escape Saddle Points Efficiently Rong Ge
- Slides: 23
How to Escape Saddle Points Efficiently? Rong Ge Duke University IPAM Optimization and Optimal Control for Complex Energy and Property Landscapes Based on joint works with Chi Jin, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan
“Simple” Objectives Simple algorithms can have new, stronger guarantees. Many interesting problems have simple objectives. Gradient Descent Simple Algorithms
Outline Machine Learning, Non-convex optimization and saddle points • Why are saddle points ubiquitous in machine learning? • Why is it enough to handle saddle points? How to escape saddle points efficiently
Why non-convex? • Many machine learning problems are non-convex • Find the best clustering • Learn the best neural networks • Find communities in social networks • In many cases, don’t have other scalable algorithms.
How to Optimize Non-convex Problems? • In Theory • NP-hard in worst case! better to avoid it • In (machine learning) practice • Stochastic gradient descent (SGD) !! + tuning Why? • Hope: tractable for real-life instances? What properties can we use?
Convex Optimization Geometry Algorithm Gradient Descent (Stochastic, Accelerated, …) Can we find clean geometric property for non-convex function Newton’s Algorithm (trust region, cubic regularization…) ……. . ?
Symmetry Saddle Points • Problem asks for multiple components, but the components have no ordering. Solution = k centers
Symmetry Saddle Points • Problem asks for multiple components, but the components have no ordering. x 1 x 3 x 2 Optimal Solution (a) x 1 x 2 x 3 Equivalent Solution (b) x 2 x 3 Convex Combination (a+b)/2
“Strict Saddle” Functions [G Huang Jin Yuan’ 15] • It can be easy to find a local minima even with saddle points (and sometimes all local minima are permutations of global minima) • If all points are in one of three cases Near a local minimum Near a saddle point (in a strongly-convex ball) Hessian has negative eigenvalue Has large Gradient
Simple Example: Top eigenector
What problems are strict-saddle? Eigenvector Generalized Eigenvector Some Tensor Problems [G Huang Jin Yuan’ 15] Community/Synchronization [Bandeira, Boumal, Voroninski’ 16] Dictionary Learning [Sun Qu Wright’ 15] Matrix Completion [G Lee Ma’ 16] Matrix Sensing [Bhojanapalli, Neyshabur, Srebro’ 16] asymmetric versions, sparse PCA [G Jin Zheng’ 17]
Outline Machine Learning, Non-convex optimization and saddle points • Why are saddle points ubiquitous in machine learning? • Why is it enough to handle saddle points? How to escape saddle points efficiently
Setting • Want to optimize a function f(x) • f(x) has Lipschitz-gradient • f(x) has Lipschitz Hessian • Goal: find a local minimum of f(x). (Recall: in many cases this is also global minimum)
Gradient Descent •
What can we do on saddle points? • Rely on the second order (Hessian) information • Find the negative eigenvector, go along that direction. Can we do this without computing the Hessian ?
Our Result •
Previous Results
Intuition: constant-Hessian case • Take-away: If x 0 has > 0 projection in negative eigendirection, then GD can escape!
Difficulty •
Tight analysis for Gradient Descent • Green: region where gradient descent gets stuck • Shape of stuck region complicated • Idea: Prove the volume of stuck region is small without knowing where the region is!
Tight analysis for Gradient Descent Must be able to escape! Stuck at saddle Key observation: Width of stuck region is small.
Summary • Saddle points are ubiquitous in machine learning problems because of symmetry • For many problems, all local minima are global, and we only need to worry about saddle points. • Perturbed gradient descent can find second-order stationary points as efficiently as first order.
Open Problems • What other problems are strict saddle? • Extend to “not-so-simple” functions? • Can we design new objectives/modify old objectives to make them strict saddle? • Can we analyze other popular algorithms? • Stochastic gradient descent • Acceleration Thank You!
- Rong ge duke
- Rong chang dictation
- Muốn xóa một hoặc nhiều cột em thực hiện
- Qiu zhu
- Find similar images
- Cái cối tân
- Ngoại diên của hình vuông
- Carnivourous
- Xin cho lòng chúng con luôn mở rộng
- Euclid mở rộng
- Rong qu
- Tich phan suy rong 1
- Small business management
- Fermentation enables glycolysis to continue under
- Reading effectively and efficiently
- Natural gradient works efficiently in learning
- Extending oblivious transfers efficiently
- Points of parity and points of difference
- Brand positioning bullseye
- How much hellstone for a full set
- Lateral wall of nasal cavity
- Ridge lap gövde
- Terrain features on a military map
- Saddle joint movement