Convolutional Neural Network CNN Network Architecture designed for

Image Classification dog cat tree Model Cross entropy 100 x 100 (All the images

Image Classification 3 channels 3 -D tensor 100 x 100 100 100 x 100

Fully Connected Network …… 100 x 100 …… …… 3 x 107 100 x

Observation 1 Input Identifying some critical patterns Layer 1 Layer 2 …… Bird ……?

https: //www. dcard. tw/f/funny/p/233833012 6

Observation 1 Need to see the Input whole image? A neuron does not have

3 x 3 x 3 weights Simplification 1 …. . . 3 x 3

Simplification 1 • Can different neurons have different sizes of receptive field? • Cover

Simplification 1 – Typical Setting Each receptive field has a set of neurons (e.

Observation 2 • The same patterns appear in different regions. I detect “beak” in

…. . . Simplification 2 3 x 3 x 3 weights bias … 1

…. . . Simplification 2 bias 1 …. . . bias … Two neurons

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e.

Benefit of Convolutional Layer Fully Connected Layer Jack of all trades, master of none

Another story based on filter Convolutional Layer Filter 1 3 x channel tensor Convolution

Convolutional Layer 0 1 0 0 1 1 0 0 0 1 1 0

Convolutional Layer stride=1 1 -1 -1 -1 1 Filter 1 1 0 0 1

Convolutional Layer stride=1 1 0 0 1 0 0 0 1 0 1 1

Convolutional Layer 64 Convolution filters Convolution 3 -1 -1 -1 -3 -1 1 -1

Multiple Convolutional Layers 64 Convolution filters 3 -1 -1 -1 -3 -1 1 -1

Multiple Convolutional Layers 64 Convolution filters Convolution 1 0 0 0 1 0 0

Comparison of Two Stories …. . . Receptive field 1 -1 -1 Filter -1

…. . . The neurons with different receptive fields share the parameters. bias 1

Convolutional Layer Neuron Version Story Filter Version Story Each neuron only considers a receptive

Observation 3 • Subsampling the pixels will not change the object bird subsampling 27

Pooling – Max Pooling 1 -1 -1 -1 Filter 1 1 -1 -1 -1

Convolutional Layers + Pooling Repeat Convolution Pooling 3 -1 -1 -1 -3 -1 1

The whole CNN cat dog …… Convolution softmax Pooling Fully Connected Layers Convolution Pooling

Application: Playing Go Network 19 x 19 matrix 19(image) x 19 vector 48 channels

Why CNN for Go playing? • Some patterns are much smaller than the whole

Why CNN for Go playing? • Subsampling the pixels will not change the object

More Applications Speech https: //dl. acm. org/doi/10. 1109/T ASLP. 2014. 2339736 Natural Language Processing

To learn more … • CNN is not invariant to scaling and rotation (we

Slides: 35

Download presentation

Convolutional Neural Network (CNN) Network Architecture designed for Image 1

Image Classification dog cat tree Model Cross entropy 100 x 100 (All the images to be classified have the same size. ) 2

Image Classification 3 channels 3 -D tensor 100 x 100 100 100 x 100 value represents intensity 3

Fully Connected Network …… 100 x 100 …… …… 3 x 107 100 x 100 …… …… 100 x 3 1000 Do we really need “fully connected” in image processing? 4

Observation 1 Input Identifying some critical patterns Layer 1 Layer 2 …… Bird ……? …… …… Perhaps human also identify birds in a similar way … 5

https: //www. dcard. tw/f/funny/p/233833012 6

Observation 1 Need to see the Input whole image? A neuron does not have to see the whole image. Layer 1 Layer 2 …… bird …… …… …… basic detector advanced detector Some patterns are much smaller than the whole image. 7

3 x 3 x 3 weights Simplification 1 …. . . 3 x 3 bias 3 x 3 …. . . 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 …. . . Receptive field 1 8

Simplification 1 • Can different neurons have different sizes of receptive field? • Cover only some channels? • Not square receptive field? 3 x 3 weights Receptive field 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 the same receptive field Can be overlapped 9

Simplification 1 – Typical Setting Each receptive field has a set of neurons (e. g. , 64 neurons). all channels kernel size (e. g. , 3 x 3) stride = 2 overlap 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 padding The receptive fields cover the whole image. 10

Observation 2 • The same patterns appear in different regions. I detect “beak” in my receptive field. Each receptive field needs a “beak” detector? I detect “beak” in my receptive field. 11

…. . . Simplification 2 3 x 3 x 3 weights bias … 1 parameter sharing …. . . 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 3 x 3 x 3 weights bias … 1 12

…. . . Simplification 2 bias 1 …. . . bias … Two neurons with the same receptive field would not share parameters. … 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 1 13

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e. g. , 64 neurons). …… …… 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 14

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e. g. , 64 neurons). Each receptive field has the neurons with the same set of parameters. filter 1 filter 2 filter 3 filter 4 …… …… 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 15

Benefit of Convolutional Layer Fully Connected Layer Jack of all trades, master of none Receptive Field Parameter Sharing Convolutional Layer Larger model bias (for image) • Some patterns are much smaller than the whole image. • The same patterns appear in different regions. 16

Another story based on filter Convolutional Layer Filter 1 3 x channel tensor Convolution …… Filter 2 3 x channel tensor channel = 1 (black and white) …… channel = 3 (colorful) Each filter detects a small pattern (3 x channel). 17

Convolutional Layer 0 1 0 0 1 1 0 0 0 1 1 0 0 6 x 6 image 1 -1 -1 -1 1 Filter 1 -1 -1 -1 Filter 2 1 1 1 -1 -1 -1 …… 1 0 0 1 Consider channel = 1 (black and white image) (The values in the filters are unknown parameters. ) 18

Convolutional Layer stride=1 1 -1 -1 -1 1 Filter 1 1 0 0 1 0 0 0 1 0 1 1 0 0 0 3 -1 -3 1 0 -3 0 0 1 1 0 0 -3 -3 0 1 3 -2 -2 -1 6 x 6 image 19

Convolutional Layer stride=1 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 6 x 6 image -1 -1 -1 -1 Filter 2 Do the same process for every filter 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 Feature -3 Map 0 -1 -2 -2 0 -2 -4 1 1 -1 3 20

Convolutional Layer 64 Convolution filters Convolution 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 “Image” with 64 channels ……

Multiple Convolutional Layers 64 Convolution filters 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 “Image” with 64 channels Convolution …… Filter: 3 x 64 64 22

Multiple Convolutional Layers 64 Convolution filters Convolution 1 0 0 0 1 0 0 1 0 1 1 0 0 0 …… 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 23

Comparison of Two Stories …. . . Receptive field 1 -1 -1 Filter -1 1 -1 3 x channel -1 -1 1 tensor (ignore bias in this slide) 24

…. . . The neurons with different receptive fields share the parameters. bias 1 …. . . bias … Each filter convolves over the input image. … 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 1 25

Convolutional Layer Neuron Version Story Filter Version Story Each neuron only considers a receptive field. There a set of filters detecting small patterns. The neurons with different receptive fields share the parameters. Each filter convolves over the input image. They are the same story. 26

Observation 3 • Subsampling the pixels will not change the object bird subsampling 27

Pooling – Max Pooling 1 -1 -1 -1 Filter 1 1 -1 -1 -1 Filter 2 3 -1 -1 -1 -3 1 0 -3 -1 -1 -2 1 -3 -3 0 1 -1 -1 -2 1 3 -2 -2 -1 -1 0 -4 3 28

Convolutional Layers + Pooling Repeat Convolution Pooling 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 “Image” with 64 channels …… 3 -1 0 3 1 0 1 3 29

The whole CNN cat dog …… Convolution softmax Pooling Fully Connected Layers Convolution Pooling Flatten 30

Application: Playing Go Network 19 x 19 matrix 19(image) x 19 vector 48 channels in Alpha Go Black: 1 white: -1 none: 0 Next move (19 x 19 positions) 19 x 19 classes Fully-connected network can be used But CNN performs much better. 31

Why CNN for Go playing? • Some patterns are much smaller than the whole image Alpha Go uses 5 x 5 for first layer • The same patterns appear in different regions. 32

Why CNN for Go playing? • Subsampling the pixels will not change the object Pooling How to explain this? ? ? Alpha Go does not use Pooling …… 33

More Applications Speech https: //dl. acm. org/doi/10. 1109/T ASLP. 2014. 2339736 Natural Language Processing https: //www. aclweb. org/antholo gy/S 15 -2079/ 34

To learn more … • CNN is not invariant to scaling and rotation (we need data augmentation ). Spatial Transformer Layer https: //youtu. be/So. Cyw. Z 1 h. Zak (in Mandarin) 35