Neural Network Basics I Gang Cao Outline Neural

可视计算 Neural Network Basics I Gang Cao

Outline • Neural network • CNN • Attention in CV 3

Neural network 4

Neural network 逻辑回归 neuron 5

Neural network 6

Neural network—Activation function Why do you need non-linear activation functions? Z[1] a[1] Z[2] a[2] W’ b’ 11

Neural network—Dropout 12

Neural network—Dropout Hinton, Image. Net Classification with Deep Convolutional. 2012 某个神经元的激活值以一定的概率p，让其停止作 13

Neural network—Dropout 14

Neural network—Batch Normalization 15

Neural network—Batch Normalization 将每一层的均值和方差作为参数进行学习 17

Neural network—Loss function A good function should make the loss of all examples as small as possible. 18

Neural network 19

CNN 20

Convolutional Neural network 21

The whole CNN cat dog …… Convolution Max Pooling Fully Connected Feedforward network Convolution Can repeat many times Max Pooling Flatten 22

Why CNN for Image • Some patterns are much smaller than the whole image A neuron does not have to see the whole image to discover the pattern. Connecting to small region with less parameters “beak” detector 23

Why CNN for Image • The same patterns appear in different regions. “upper-left beak” detector Do almost the same thing They can use the same set of parameters. “middle beak” detector 24

Why CNN for Image • Subsampling the pixels will not change the object bird subsampling We can subsample the pixels to make image smaller Less parameters for the network to process the image 25

The whole CNN Property 1 Ø Some patterns are much smaller than the whole image Property 2 Ø The same patterns appear in different regions. Property 3 Convolution Max Pooling Convolution Ø Subsampling the pixels will not change the object Flatten Can repeat many times Max Pooling 26

The whole CNN cat dog …… Convolution Max Pooling Fully Connected Feedforward network Convolution Can repeat many times Max Pooling Flatten 27

CNN – Convolution 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 1 -1 -1 -1 1 Filter 1 -1 -1 -1 Filter 2 Matrix 1 1 1 …… 6 x 6 image Those are the network parameters to be learned. -1 -1 -1 Matrix Each filter detects a small Property 1 28 pattern (3 x 3).

CNN – Convolution 1 stride=1 -1 -1 -1 1 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 3 Filter 1 -1 6 x 6 image 29

CNN – Convolution 1 -1 -1 -1 1 If stride=2 0 0 1 0 -1 -1 1 0 0 0 1 1 0 0 0 1 0 1 0 1 6 x 6 image 0 0 3 Filter 1 -3 We set stride=1 below 30

CNN – Convolution 1 -1 -1 -1 1 stride=1 1 0 0 0 1 0 0 1 0 1 1 1 6 x 6 image 1 0 0 0 3 -1 -3 1 0 -3 -3 -3 0 1 3 -2 -2 -1 31

-1 Do the same process for -1 -1 every filter CNN – Convolution stride=1 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 6 x 6 image 1 1 1 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 Feature -3 Map 0 -1 -2 -2 0 -2 -4 1 -1 -1 -1 Filter 2 1 -1 3 32

CNN – Convolution 33

CNN – Convolution 34

CNN – Convolution Vgg 16 conv 5_3 [Xiu-Shen Wei et al, 2017 ] 35

Padding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 x 6 0 0 0 0 0 6 x 6 Zero padding = * p 3 x 3 4 x 4 6 x 6 36

CNN – Colorful image -1 -1 11 -1 -1 -1 -1 -1 -1 11 -1 -1 -1 Filter 1 -1 -1 1 1 -1 -1 Filter 2 -1 -1 11 -1 -1 -1 -1 1 1 0 0 0 0 1 0 11 00 00 01 00 1 0 0 00 11 01 00 10 0 1 1 0 0 1 00 00 10 11 00 0 11 00 00 01 10 0 0 1 0 0 00 11 00 01 10 0 1 0 37

Convolution v. s. Fully Connected 1 0 0 1 0 0 0 1 0 0 1 0 1 -1 -1 -1 1 -1 -1 -1 convolution image 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 …… Fullyconnected 1 38

1 -1 -1 Filter 1 -1 -1 -1 1 0 0 0 0 1 1 1 1 0 0 0 6 x 6 image Less parameters! 3 7 0 : 1 8: 9 0 : 0 10: … 0 1 0 2 0 : 3 0 : 0 4 : … 1 0 0 1: 1 13 0 : 0 14: 15: 1 16: 1 … Only connect to 9 input, not fully connected 39

1 -1 -1 Filter 1 -1 -1 -1 1 0 0 0 0 1 1 1 1 0 0 0 6 x 6 image Less parameters! -1 13 0 : 0 14: 15: 1 16: 1 … Even less parameters! 7: 0 8: 1 9: 0 10: 0 … 0 1 0 3 … 1 0 0 1 1 : 2 0 3: 0 4 : Shared weights 40

The whole CNN cat dog …… Convolution Max Pooling Fully Connected Feedforward network Convolution Can repeat many times Max Pooling Flatten 41

CNN – Max Pooling 1 -1 -1 -1 Filter 1 1 -1 -1 Filter 2 -1 3 -1 -1 -1 -3 1 0 -3 -1 -1 -2 1 -3 -3 0 1 -1 -1 -2 1 3 -2 -2 -1 -1 0 -4 3 42

CNN – Max Pooling 1 0 0 0 1 0 0 1 0 1 1 1 6 x 6 image 1 0 0 0 New image but smaller Conv Max Pooling 3 -1 0 3 1 0 1 3 2 x 2 image Each filter is a channel 43

The whole CNN 3 -1 0 3 1 0 1 Convolution 3 Max Pooling A new image Smaller than the original image The number of the channel is the number of filters Convolution Can repeat many times Max Pooling 44

The whole CNN cat dog …… Convolution Max Pooling A new image Fully Connected Feedforward network Convolution Max Pooling Flatten A new image 45

The whole CNN Hyperparameters • K: Number of filters • F: their spatial extent F • S: Stride • P: the amount of zero padding 46

Flatten 3 0 1 3 -1 0 3 1 0 1 3 3 Flatten -1 1 0 3 Fully Connected Feedforward network 47

48

Modern Architectures Le. Net Gradient-based learning applied to document recognition, Y. Le. Cun, L. Bottou, Y. Bengio, and P. Haffner, 1998. 49

Modern Architectures Alex. Net 50

Modern Architectures VGG Karen Simonyan, Andrew Zisserman (=Visual Geometry Group) Very Deep Convolutional Networks for Large-Scale Image Recognition, arxiv, 2014 51

Modern Architectures VGG 52

Modern Architectures Goog. Le. Net 中间层的辅助LOSS单元: 计算损失时让低层的特征也有很好的区分能力，从而让网络更好地被训练 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich Going Deeper with Convolutions, CVPR 2015 53

Modern Architectures Goog. Le. Net 1. 通过3× 3的池化、以及1× 1、 3× 3和5× 5这三种不同尺度的卷积核，一共 4种方式对输入的特征响应图做了特征提取。 2. 为了降低计算量。同时让信息通过更少的连接传递以达到更加稀疏的特性，采用 1× 1 卷积核来实现降维 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich Going Deeper with Convolutions, CVPR 2015 54

Modern Architectures Goog. Le. Net 55

Modern Architectures Res. Net 超过百层 • 引入残差单元来解决退化问题 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, CVPR 2016. 56

Modern Architectures Res. Net zero_padding: 对恒等层进行0 填充的方式将维度补充完整。这种方法不会增加额外的参数 projection: 在恒等层采用 1 x 1 的卷积核来增加维度。这种方法会增加额外的参数 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, CVPR 2016. 57

Modern Architectures Res. Net Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, CVPR 2016. 58

Modern Architectures The Deeper, the Better 59

Modern Architectures The Deeper, the Better Deeper networks can cover more complex problems Increasingly large receptive field size & rich patterns From 2 to 10: 2010 -2012 • Re. LUs • Dropout From 20 to 100/1000 • Residual networks From 10 to 20: 2015 Batch Normalization 60

Attention in CV 61

Attention 1）学习权重分布 Attention Proposal Network Jianlong Fu, Heliang Zheng, Tao Mei (Microsoft Research), Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition, CVPR 2017 63

Attention 1）学习权重分布 Attention Proposal Network Jianlong Fu, Heliang Zheng, Tao Mei (Microsoft Research), Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition, CVPR 2017 64

References 1. Creative. AI: Deep Learning for Graphics NN Tricks & Architectures 2. Deep Learning Tutorial 李宏毅 3. https: //blog. csdn. net/qq_38284951/article/details/82769 994(【计算机视觉】深入理解Attention机制) 4. cs 231 n 66

DOWNLOADS at http: //vcc. szu. edu. cn Thank You!