Deep Convolutional Neural Network and Computer Vision Hyeonseob

Deep Convolutional Neural Network and Computer Vision Hyeonseob Nam Computer Vision Lab. 1

Outline • Basic Neural Network • Convolutional Neural Net • Recent Breakthroughs in Computer Vision – Alex. Net (Krizhevshy et al. , 2012) – De. CAF (Donahue et al. , 2014) – R-CNN (Girshick et al. , 2014) • Conclusion 2

1. Basic Neural Network 3

Artificial Neural Network 4

History of Neural Networks • First generation (1958~): – Perceptrons • Second generation (1986~): – Multilayer perceptrons • Third generation (2006~): – Deep learning 5

Simple Neuron Models Activation function Binary threshold neuron Sigmoid neuron Rectified linear neuron 1 1 0. 5 0 0 0 otherwise 6

Gradient Descent • + Robust to noise - Slow, offline E 7

Stochastic Gradient Descent (SGD) • Update weights for each sample + Fast, online - Sensitive to noise • Minibatch SGD: Update weights for a small set of samples + Fast, online + Robust to noise 8

Momentum • Remember the previous direction + Converge faster + Avoid oscillation 9

Weight Decay • Penalize the size of the weights + Improve generalization a lot! 10

Single-Layer Neural Net Sigmoid 11

Multi-Layer: Backpropagation Sigmoid 12

2. Convolutional Neural Network 13

Genealogy of Neural Network 14

History of CNNs 15

What is a CNN? • Weight sharing Feature map + Reduce the number of parameters significantly + Transition equivariant 16

What is a CNN? • Weight sharing + Reduce the number of parameters significantly + Transition equivariant • Subsampling (pooling) + Reduce the size of feature maps + Local transition invariant 17

Learned convolutional filters: 1 st layer Visualizing and understanding convolutional neural networks. Zeiler, Matthew D. , and Rob Fergus. ar. Xiv preprint ar. Xiv: 1311. 2901 (2013). 18

Strongest activations: 5 th layer Visualizing and understanding convolutional neural networks. Zeiler, Matthew D. , and Rob Fergus. ar. Xiv preprint ar. Xiv: 1311. 2901 (2013). 19

Le. Net 5 (Le. Cun et al. , 1998) 20

Le. Net 5 (Le. Cun et al. , 1998) • 82/10, 000 errors in MINST dataset 21

3. Recent Breakthroughs in Computer Vision - Alex. Net (Krizhevshy et al. , 2012) - De. CAF (Donahue et al. , 2014) - R-CNN (Girshick et al. , 2014) 22

3 -1. Alex. Net Krizhevsky et al. , “Imagenet classification with deep convolutional neural networks. ” NIPS, 2012. 23

Difficulty of Training Deep Neural Net • Vanishing gradient problem Logistic unit Rectified linear unit • Requiring lots of labeled data Big data(Image. Net), data augmentation • Overfitting problem Dropout, local response normalization, overlapping pooling • Huge amount of computations GPU implementation 24

Architecture Learning Rule 25

Dropout (Hinton et al. , 2012) • 26

Dropout (Hinton et al. , 2012) • Feature learned on MNIST dataset Without dropout With dropout 27

Results 28

Results • Use the last hidden layer as a feature vector Test images Top six nearest neighbors (Euclidean) 29

3 -2. De. CAF Donahue et al. , “Decaf: A deep convolutional activation feature for generic visual recognition. ” ICML, 2014. 30

Idea • Deep Learning is Cool! – Deep architectures are able to capture salient aspects of a given domain. – Performs better than traditional hand-engineered representations. – Recently outperformed all known methods on a large scale recognition challenge. • However, – Supervised deep architectures with limited training data generally overfit. – Many computer vision challenges have limited training examples. • Solution – Pre-train on Image. Net, transfer to another domain. 31

Deep Convolutional Activation Features (De. CAF) • 32

Visualization • ILSVRC-2012 • SUN-367 33

Object recognition (Caltech-101) • Basic-level object category recognition 34

Domain adaptation (Office) • Particular dataset used with three domains – Amazon, Webcam, Dslr – Train on one domain(source), test on another domain(target). webcam dslr 35

Fine-grained recognition (Caltech-UCSD) • Two approaches – Adapt Image. Net-like pieline, extract De. CAF directly. – Adapt deformable part descriptors (DPD) method, extract De. CAF from each part. + logistic regression 36

Scene recognition (SUN-397) • Different from common object recognition task – Scene categories: abbey, diner, mosque, stadium, … 37

Software • http: //caffe. berkeleyvision. org/ 38

3 -3. R-CNN Girshick et al. , “Rich feature hierarchies for accurate object detection and semantic segmentation. ” CVPR, 2014. 39

Idea • Apply CNN to object detection/segmentation tasks • Two key insights: – Applying high-capacity convolutional neural networks to bottom-up region proposals in order to localize and segment objects – Supervised pre-training on a large dataset followed by domain-specific fine-tuning on a small dataset 40

CNN Architecture • Alex. Net (Krizhevsky et al. , 2012) 41

Pipeline Selective search (Uijlings et al. , 2013) 42

Training 43

Visualization 44

Bounding-box Regression G (ground truth) P (region proposal) CNN feature 45

Results (Detection) 46

Results (Segmentation) 47

4. Conclusion 48

Conclusion • Deep CNNs are recently applied to various visual recognition tasks, and outperformed all known methods in many challenges. • We can combine classical computer vision techniques with deep learning. • We can apply CNNs to various computer vision tasks with scarce training data (by pre-training on a big dataset and optionally fine-tuning). 49

[Razavian et al, 2014]: "It can be concluded that from now on, deep learning with CNN has to be considered as the primary candidate in essentially any visual recognition task. " 50

Thank you 51

References • Le. Cun, Yann, et al. “Backpropagation applied to handwritten zip code recognition. ” Neural computation, 1989. • Le. Cun, Yann, et al. "Gradient-based learning applied to document recognition. "Proceedings of the IEEE, 1998. • Hinton, Geoffrey E. , et al. "Improving neural networks by preventing coadaptation of feature detectors. " ar. Xiv preprint ar. Xiv: 1207. 0580, 2012. • Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks. ” Advances in neural information processing systems, 2012. • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. “Decaf: A deep convolutional activation feature for generic visual recognition. ” ICML, 2014. • R. Girshick, J. Donahue, T. Darrell, and J. Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation. ” CVPR, 2014. 52