Stochastic Gradient Descent Weihang Chen Xingchen Chen Jinxiu

  • Slides: 24
Download presentation
Stochastic Gradient Descent Weihang Chen, Xingchen Chen, Jinxiu Liang, Cheng xu, Zehao Chen and

Stochastic Gradient Descent Weihang Chen, Xingchen Chen, Jinxiu Liang, Cheng xu, Zehao Chen and Donglin He School of Computer &Enginnering, South China University of Technology Guangdong, P. R. China March 21, 2017

Outline • Why Stochastic Gradient Descent (SGD) • Convergence Analysis • Performance • Extensions

Outline • Why Stochastic Gradient Descent (SGD) • Convergence Analysis • Performance • Extensions and Variants

Why SGD •

Why SGD •

Example •

Example •

Example •

Example •

Experiment for Performance • Question

Experiment for Performance • Question

Solution •

Solution •

Experiment for Performance(BGD) Step count: 38 The amount of calculation: 38 * 100

Experiment for Performance(BGD) Step count: 38 The amount of calculation: 38 * 100

Experiment for Performance(SGD) Step count: 998 The amount of calculation: 38 * 100

Experiment for Performance(SGD) Step count: 998 The amount of calculation: 38 * 100

Convergence Analysis •

Convergence Analysis •

Experiment for Convergence(Linear Regression) Classified by SVM

Experiment for Convergence(Linear Regression) Classified by SVM

Solution •

Solution •

Result SGD loss – iteration times

Result SGD loss – iteration times

Result Linear Regression with SGD

Result Linear Regression with SGD

Large Learning Rate

Large Learning Rate

MBGD •

MBGD •

Extensions and Variants • Momentum • Average • Ada. Grad • RMSProp • Adam

Extensions and Variants • Momentum • Average • Ada. Grad • RMSProp • Adam • More…

Momentum •

Momentum •

Average •

Average •

Ada. Grad •

Ada. Grad •

RMSProp •

RMSProp •

Adam •

Adam •

Reference • Cnblogs Murongxixi(2013): Stochastic Gradient Descent • Hongmin Cai(2016): Sub-gradient Method • Abdelkrim

Reference • Cnblogs Murongxixi(2013): Stochastic Gradient Descent • Hongmin Cai(2016): Sub-gradient Method • Abdelkrim Bennar(2007): Almost sure convergence of a stochastic approximation process in a convex set • A. SHAPIRO, Y. WARDI(1996): Convergence Analysis of Gradient Descent Stochastic Algorithms • Int 8(2016): Optimization techniques comparison in Julia: SGD, Momentum, Adagrad, Adadelta, Adam • Wikipedia: Stochastic_gradient_descent • Sebastian Ruder (2016): An overview of gradient descent optimization algorithms • Zhihua Zhou(2016): Machine Learning, Chapter 6 • Ycszen(2016): Summary and comparison of SGD,Adagrad, Adadelta,Adam

Thank you for your time

Thank you for your time