Stochastic Gradient Descent Weihang Chen, Xingchen Chen, Jinxiu Liang, Cheng xu, Zehao Chen and Donglin He School of Computer &Enginnering, South China University of Technology Guangdong, P. R. China March 21, 2017
Experiment for Performance(BGD) Step count: 38 The amount of calculation: 38 * 100
Experiment for Performance(SGD) Step count: 998 The amount of calculation: 38 * 100
Convergence Analysis •
Experiment for Convergence(Linear Regression) Classified by SVM
Solution •
Result SGD loss – iteration times
Result Linear Regression with SGD
Large Learning Rate
MBGD •
Extensions and Variants • Momentum • Average • Ada. Grad • RMSProp • Adam • More…
Momentum •
Average •
Ada. Grad •
RMSProp •
Adam •
Reference • Cnblogs Murongxixi(2013): Stochastic Gradient Descent • Hongmin Cai(2016): Sub-gradient Method • Abdelkrim Bennar(2007): Almost sure convergence of a stochastic approximation process in a convex set • A. SHAPIRO, Y. WARDI(1996): Convergence Analysis of Gradient Descent Stochastic Algorithms • Int 8(2016): Optimization techniques comparison in Julia: SGD, Momentum, Adagrad, Adadelta, Adam • Wikipedia: Stochastic_gradient_descent • Sebastian Ruder (2016): An overview of gradient descent optimization algorithms • Zhihua Zhou(2016): Machine Learning, Chapter 6 • Ycszen(2016): Summary and comparison of SGD,Adagrad, Adadelta,Adam