Deep learning architectures Usman Roshan NJIT Figures from

  • Slides: 16
Download presentation
Deep learning architectures Usman Roshan NJIT Figures from https: //www. jeremyjordan. me/convnet-architectures/ https: //medium.

Deep learning architectures Usman Roshan NJIT Figures from https: //www. jeremyjordan. me/convnet-architectures/ https: //medium. com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488 df 5

Evolution of networks

Evolution of networks

Le. Net Deep learner by Yann Le. Cunn in 1998 to train on MNIST

Le. Net Deep learner by Yann Le. Cunn in 1998 to train on MNIST

Alex. Net Won Image. Net in 2012

Alex. Net Won Image. Net in 2012

VGG 16 Introduced in 2014 by Oxford, won Image. Net classification plus localization

VGG 16 Introduced in 2014 by Oxford, won Image. Net classification plus localization

Inception (Goog. Le. Net) I Introduced in 2014 by Google, won Image. Net classification

Inception (Goog. Le. Net) I Introduced in 2014 by Google, won Image. Net classification only

Inception II (Inception cell) Basic unit of inception

Inception II (Inception cell) Basic unit of inception

Inception III (efficient large kernel width) Two stacked 3 x 3 kernels have a

Inception III (efficient large kernel width) Two stacked 3 x 3 kernels have a similar effect of a single 5 x 5 kernel with fewer parameters. Do a 3 x 3 kernel with successive 1 x 3 and 3 x 1 convolutions

Inception IV

Inception IV

Residual networks • Deep(er) networks suffer from degradation: high error than shallow networks. One

Residual networks • Deep(er) networks suffer from degradation: high error than shallow networks. One reason is vanishing gradients • Residual connections introduced in 2015 to alleviate this problem (won Image. Net the same year)

Residual networks II Relu before residual addition gives better results than after

Residual networks II Relu before residual addition gives better results than after

Residual networks

Residual networks

Residual networks

Residual networks

Deep vs. wide networks • Deep: few nodes per layer, many layers • Wide:

Deep vs. wide networks • Deep: few nodes per layer, many layers • Wide: few layers, many nodes per layer • Turns out that a 3 -layer can express functions that a 2 -layer network would require exponential nodes (The Power of Depth for Feedforward Neural Networks, Eldan and Shamir, COLT 16)

Dense. Net Concatenate output of a layer to all successive layers

Dense. Net Concatenate output of a layer to all successive layers

Dense. Net vs. Res. Net

Dense. Net vs. Res. Net