Convolutional Neural Network CNN Network Architecture designed for

  • Slides: 35
Download presentation
Convolutional Neural Network (CNN) Network Architecture designed for Image 1

Convolutional Neural Network (CNN) Network Architecture designed for Image 1

Image Classification dog cat tree Model Cross entropy 100 x 100 (All the images

Image Classification dog cat tree Model Cross entropy 100 x 100 (All the images to be classified have the same size. ) 2

Image Classification 3 channels 3 -D tensor 100 x 100 100 100 x 100

Image Classification 3 channels 3 -D tensor 100 x 100 100 100 x 100 value represents intensity 3

Fully Connected Network …… 100 x 100 …… …… 3 x 107 100 x

Fully Connected Network …… 100 x 100 …… …… 3 x 107 100 x 100 …… …… 100 x 3 1000 Do we really need “fully connected” in image processing? 4

Observation 1 Input Identifying some critical patterns Layer 1 Layer 2 …… Bird ……?

Observation 1 Input Identifying some critical patterns Layer 1 Layer 2 …… Bird ……? …… …… Perhaps human also identify birds in a similar way … 5

https: //www. dcard. tw/f/funny/p/233833012 6

https: //www. dcard. tw/f/funny/p/233833012 6

Observation 1 Need to see the Input whole image? A neuron does not have

Observation 1 Need to see the Input whole image? A neuron does not have to see the whole image. Layer 1 Layer 2 …… bird …… …… …… basic detector advanced detector Some patterns are much smaller than the whole image. 7

3 x 3 x 3 weights Simplification 1 …. . . 3 x 3

3 x 3 x 3 weights Simplification 1 …. . . 3 x 3 bias 3 x 3 …. . . 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 …. . . Receptive field 1 8

Simplification 1 • Can different neurons have different sizes of receptive field? • Cover

Simplification 1 • Can different neurons have different sizes of receptive field? • Cover only some channels? • Not square receptive field? 3 x 3 weights Receptive field 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 the same receptive field Can be overlapped 9

Simplification 1 – Typical Setting Each receptive field has a set of neurons (e.

Simplification 1 – Typical Setting Each receptive field has a set of neurons (e. g. , 64 neurons). all channels kernel size (e. g. , 3 x 3) stride = 2 overlap 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 padding The receptive fields cover the whole image. 10

Observation 2 • The same patterns appear in different regions. I detect “beak” in

Observation 2 • The same patterns appear in different regions. I detect “beak” in my receptive field. Each receptive field needs a “beak” detector? I detect “beak” in my receptive field. 11

…. . . Simplification 2 3 x 3 x 3 weights bias … 1

…. . . Simplification 2 3 x 3 x 3 weights bias … 1 parameter sharing …. . . 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 3 x 3 x 3 weights bias … 1 12

…. . . Simplification 2 bias 1 …. . . bias … Two neurons

…. . . Simplification 2 bias 1 …. . . bias … Two neurons with the same receptive field would not share parameters. … 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 1 13

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e.

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e. g. , 64 neurons). …… …… 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 14

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e.

Simplification 2 – Typical Setting Each receptive field has a set of neurons (e. g. , 64 neurons). Each receptive field has the neurons with the same set of parameters. filter 1 filter 2 filter 3 filter 4 …… …… 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 15

Benefit of Convolutional Layer Fully Connected Layer Jack of all trades, master of none

Benefit of Convolutional Layer Fully Connected Layer Jack of all trades, master of none Receptive Field Parameter Sharing Convolutional Layer Larger model bias (for image) • Some patterns are much smaller than the whole image. • The same patterns appear in different regions. 16

Another story based on filter Convolutional Layer Filter 1 3 x channel tensor Convolution

Another story based on filter Convolutional Layer Filter 1 3 x channel tensor Convolution …… Filter 2 3 x channel tensor channel = 1 (black and white) …… channel = 3 (colorful) Each filter detects a small pattern (3 x channel). 17

Convolutional Layer 0 1 0 0 1 1 0 0 0 1 1 0

Convolutional Layer 0 1 0 0 1 1 0 0 0 1 1 0 0 6 x 6 image 1 -1 -1 -1 1 Filter 1 -1 -1 -1 Filter 2 1 1 1 -1 -1 -1 …… 1 0 0 1 Consider channel = 1 (black and white image) (The values in the filters are unknown parameters. ) 18

Convolutional Layer stride=1 1 -1 -1 -1 1 Filter 1 1 0 0 1

Convolutional Layer stride=1 1 -1 -1 -1 1 Filter 1 1 0 0 1 0 0 0 1 0 1 1 0 0 0 3 -1 -3 1 0 -3 0 0 1 1 0 0 -3 -3 0 1 3 -2 -2 -1 6 x 6 image 19

Convolutional Layer stride=1 1 0 0 1 0 0 0 1 0 1 1

Convolutional Layer stride=1 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 6 x 6 image -1 -1 -1 -1 Filter 2 Do the same process for every filter 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 Feature -3 Map 0 -1 -2 -2 0 -2 -4 1 1 -1 3 20

Convolutional Layer 64 Convolution filters Convolution 3 -1 -1 -1 -3 -1 1 -1

Convolutional Layer 64 Convolution filters Convolution 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 “Image” with 64 channels ……

Multiple Convolutional Layers 64 Convolution filters 3 -1 -1 -1 -3 -1 1 -1

Multiple Convolutional Layers 64 Convolution filters 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 “Image” with 64 channels Convolution …… Filter: 3 x 64 64 22

Multiple Convolutional Layers 64 Convolution filters Convolution 1 0 0 0 1 0 0

Multiple Convolutional Layers 64 Convolution filters Convolution 1 0 0 0 1 0 0 1 0 1 1 0 0 0 …… 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 23

Comparison of Two Stories …. . . Receptive field 1 -1 -1 Filter -1

Comparison of Two Stories …. . . Receptive field 1 -1 -1 Filter -1 1 -1 3 x channel -1 -1 1 tensor (ignore bias in this slide) 24

…. . . The neurons with different receptive fields share the parameters. bias 1

…. . . The neurons with different receptive fields share the parameters. bias 1 …. . . bias … Each filter convolves over the input image. … 1 0 0 1 11 00 00 11 0 00 11 00 0 0 1 1 0 0 00 00 11 11 00 00 1 0 0 0 11 00 00 00 11 00 0 1 0 00 11 00 0 0 1 0 00 00 11 00 1 25

Convolutional Layer Neuron Version Story Filter Version Story Each neuron only considers a receptive

Convolutional Layer Neuron Version Story Filter Version Story Each neuron only considers a receptive field. There a set of filters detecting small patterns. The neurons with different receptive fields share the parameters. Each filter convolves over the input image. They are the same story. 26

Observation 3 • Subsampling the pixels will not change the object bird subsampling 27

Observation 3 • Subsampling the pixels will not change the object bird subsampling 27

Pooling – Max Pooling 1 -1 -1 -1 Filter 1 1 -1 -1 -1

Pooling – Max Pooling 1 -1 -1 -1 Filter 1 1 -1 -1 -1 Filter 2 3 -1 -1 -1 -3 1 0 -3 -1 -1 -2 1 -3 -3 0 1 -1 -1 -2 1 3 -2 -2 -1 -1 0 -4 3 28

Convolutional Layers + Pooling Repeat Convolution Pooling 3 -1 -1 -1 -3 -1 1

Convolutional Layers + Pooling Repeat Convolution Pooling 3 -1 -1 -1 -3 -1 1 -1 0 -2 -3 1 -3 -1 0 -2 1 3 -1 -2 0 -2 -4 -1 3 1 “Image” with 64 channels …… 3 -1 0 3 1 0 1 3 29

The whole CNN cat dog …… Convolution softmax Pooling Fully Connected Layers Convolution Pooling

The whole CNN cat dog …… Convolution softmax Pooling Fully Connected Layers Convolution Pooling Flatten 30

Application: Playing Go Network 19 x 19 matrix 19(image) x 19 vector 48 channels

Application: Playing Go Network 19 x 19 matrix 19(image) x 19 vector 48 channels in Alpha Go Black: 1 white: -1 none: 0 Next move (19 x 19 positions) 19 x 19 classes Fully-connected network can be used But CNN performs much better. 31

Why CNN for Go playing? • Some patterns are much smaller than the whole

Why CNN for Go playing? • Some patterns are much smaller than the whole image Alpha Go uses 5 x 5 for first layer • The same patterns appear in different regions. 32

Why CNN for Go playing? • Subsampling the pixels will not change the object

Why CNN for Go playing? • Subsampling the pixels will not change the object Pooling How to explain this? ? ? Alpha Go does not use Pooling …… 33

More Applications Speech https: //dl. acm. org/doi/10. 1109/T ASLP. 2014. 2339736 Natural Language Processing

More Applications Speech https: //dl. acm. org/doi/10. 1109/T ASLP. 2014. 2339736 Natural Language Processing https: //www. aclweb. org/antholo gy/S 15 -2079/ 34

To learn more … • CNN is not invariant to scaling and rotation (we

To learn more … • CNN is not invariant to scaling and rotation (we need data augmentation ). Spatial Transformer Layer https: //youtu. be/So. Cyw. Z 1 h. Zak (in Mandarin) 35