Deep Convolutional Neural Network and Computer Vision Hyeonseob
Deep Convolutional Neural Network and Computer Vision Hyeonseob Nam Computer Vision Lab. 1
Outline • Basic Neural Network • Convolutional Neural Net • Recent Breakthroughs in Computer Vision – Alex. Net (Krizhevshy et al. , 2012) – De. CAF (Donahue et al. , 2014) – R-CNN (Girshick et al. , 2014) • Conclusion 2
1. Basic Neural Network 3
Artificial Neural Network 4
History of Neural Networks • First generation (1958~): – Perceptrons • Second generation (1986~): – Multilayer perceptrons • Third generation (2006~): – Deep learning 5
Simple Neuron Models Activation function Binary threshold neuron Sigmoid neuron Rectified linear neuron 1 1 0. 5 0 0 0 otherwise 6
Gradient Descent • + Robust to noise - Slow, offline E 7
Stochastic Gradient Descent (SGD) • Update weights for each sample + Fast, online - Sensitive to noise • Minibatch SGD: Update weights for a small set of samples + Fast, online + Robust to noise 8
Momentum • Remember the previous direction + Converge faster + Avoid oscillation 9
Weight Decay • Penalize the size of the weights + Improve generalization a lot! 10
Single-Layer Neural Net Sigmoid 11
Multi-Layer: Backpropagation Sigmoid 12
2. Convolutional Neural Network 13
Genealogy of Neural Network 14
History of CNNs 15
What is a CNN? • Weight sharing Feature map + Reduce the number of parameters significantly + Transition equivariant 16
What is a CNN? • Weight sharing + Reduce the number of parameters significantly + Transition equivariant • Subsampling (pooling) + Reduce the size of feature maps + Local transition invariant 17
Learned convolutional filters: 1 st layer Visualizing and understanding convolutional neural networks. Zeiler, Matthew D. , and Rob Fergus. ar. Xiv preprint ar. Xiv: 1311. 2901 (2013). 18
Strongest activations: 5 th layer Visualizing and understanding convolutional neural networks. Zeiler, Matthew D. , and Rob Fergus. ar. Xiv preprint ar. Xiv: 1311. 2901 (2013). 19
Le. Net 5 (Le. Cun et al. , 1998) 20
Le. Net 5 (Le. Cun et al. , 1998) • 82/10, 000 errors in MINST dataset 21
3. Recent Breakthroughs in Computer Vision - Alex. Net (Krizhevshy et al. , 2012) - De. CAF (Donahue et al. , 2014) - R-CNN (Girshick et al. , 2014) 22
3 -1. Alex. Net Krizhevsky et al. , “Imagenet classification with deep convolutional neural networks. ” NIPS, 2012. 23
Difficulty of Training Deep Neural Net • Vanishing gradient problem Logistic unit Rectified linear unit • Requiring lots of labeled data Big data(Image. Net), data augmentation • Overfitting problem Dropout, local response normalization, overlapping pooling • Huge amount of computations GPU implementation 24
Architecture Learning Rule 25
Dropout (Hinton et al. , 2012) • 26
Dropout (Hinton et al. , 2012) • Feature learned on MNIST dataset Without dropout With dropout 27
Results 28
Results • Use the last hidden layer as a feature vector Test images Top six nearest neighbors (Euclidean) 29
3 -2. De. CAF Donahue et al. , “Decaf: A deep convolutional activation feature for generic visual recognition. ” ICML, 2014. 30
Idea • Deep Learning is Cool! – Deep architectures are able to capture salient aspects of a given domain. – Performs better than traditional hand-engineered representations. – Recently outperformed all known methods on a large scale recognition challenge. • However, – Supervised deep architectures with limited training data generally overfit. – Many computer vision challenges have limited training examples. • Solution – Pre-train on Image. Net, transfer to another domain. 31
Deep Convolutional Activation Features (De. CAF) • 32
Visualization • ILSVRC-2012 • SUN-367 33
Object recognition (Caltech-101) • Basic-level object category recognition 34
Domain adaptation (Office) • Particular dataset used with three domains – Amazon, Webcam, Dslr – Train on one domain(source), test on another domain(target). webcam dslr 35
Fine-grained recognition (Caltech-UCSD) • Two approaches – Adapt Image. Net-like pieline, extract De. CAF directly. – Adapt deformable part descriptors (DPD) method, extract De. CAF from each part. + logistic regression 36
Scene recognition (SUN-397) • Different from common object recognition task – Scene categories: abbey, diner, mosque, stadium, … 37
Software • http: //caffe. berkeleyvision. org/ 38
3 -3. R-CNN Girshick et al. , “Rich feature hierarchies for accurate object detection and semantic segmentation. ” CVPR, 2014. 39
Idea • Apply CNN to object detection/segmentation tasks • Two key insights: – Applying high-capacity convolutional neural networks to bottom-up region proposals in order to localize and segment objects – Supervised pre-training on a large dataset followed by domain-specific fine-tuning on a small dataset 40
CNN Architecture • Alex. Net (Krizhevsky et al. , 2012) 41
Pipeline Selective search (Uijlings et al. , 2013) 42
Training 43
Visualization 44
Bounding-box Regression G (ground truth) P (region proposal) CNN feature 45
Results (Detection) 46
Results (Segmentation) 47
4. Conclusion 48
Conclusion • Deep CNNs are recently applied to various visual recognition tasks, and outperformed all known methods in many challenges. • We can combine classical computer vision techniques with deep learning. • We can apply CNNs to various computer vision tasks with scarce training data (by pre-training on a big dataset and optionally fine-tuning). 49
[Razavian et al, 2014]: "It can be concluded that from now on, deep learning with CNN has to be considered as the primary candidate in essentially any visual recognition task. " 50
Thank you 51
References • Le. Cun, Yann, et al. “Backpropagation applied to handwritten zip code recognition. ” Neural computation, 1989. • Le. Cun, Yann, et al. "Gradient-based learning applied to document recognition. "Proceedings of the IEEE, 1998. • Hinton, Geoffrey E. , et al. "Improving neural networks by preventing coadaptation of feature detectors. " ar. Xiv preprint ar. Xiv: 1207. 0580, 2012. • Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks. ” Advances in neural information processing systems, 2012. • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. “Decaf: A deep convolutional activation feature for generic visual recognition. ” ICML, 2014. • R. Girshick, J. Donahue, T. Darrell, and J. Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation. ” CVPR, 2014. 52
- Slides: 52