Convolutional Object Finder A Neural Architecture for Fast

Convolutional Object Finder A Neural Architecture for Fast and Robust Object Detection Stefan Duffner and Christophe Garcia France Telecom division R&D Pascal Visual Object Classes Challenge Workshop, Southampton, 11/04/05

PASCAL Visual Object Classes Challenge ð We participated in 2 competitions for 2 object classes: ðDetection of motorbikes using the provided training and validation sets and test set 1 (competition 5) ðDetection of motorbikes using the provided training and validation sets and test set 2 (competition 6) ðDetection of cars using the provided training and validation sets and test set 1 (competition 5) ðDetection of cars using the provided training and validation sets and test set 2 (competition 6) ð Our method is based on a convolutional neural network architecture as described in: ðConvolutional Face Finder: A neural architecture for fast and robust face detection Christophe Garcia and Manolis Delakis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, Issue 11, Nov. 2004, pages: 1408 -1423 Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -2 -

Challenges Object pattern variability -size -orientation, pose - form - texture - occlusions, … Scene complexity - textured background - uncontrolled illumination - video blurring effect - low quality images, … Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -3 -

Image-based Object Detection ð Gather positive (motorbikes) and negative examples (nonmotorbikes) ðVirtual motorbikes, bootstrapping ð Train a a twoclassifier ð Apply a search strategy in scale-space Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -4 -

Convolutional Neural Networks ð Class of NNs introduced by Y. Le. Cun in 1990 ð handwritten characters recognition (US postal) ð principles of mammalian visual processing: simple-to-complex cells ð Local receptive fields, Shared weights, Subsampling ð Built-in extraction + classification modules ð Natural candidates for image classification tasks ð Automatically learnt feature extractors and classifiers ð Robustness to pattern variations, occlusions, noise… ð Straightforward real time implementation Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -5 -

Learning a cascade of filters Motorbike? [-1, 1] N 2 Retina 52 x 30 C 1 S 1 C 2 S 2 C 2 -S 2: convolution Feature Extraction C 1: convolutional 5 x 5 ü 131. 475 connections S 1 Classification → C 2: 2 layer → 13 x 3 Feature S 1 → C 2: 1→ 2 subsampling Layers 4 FMs 28 x 32 Feature combination Layers S 2 → N 1: 1 → 1 14 FMs 6 x 7 ü … only 1951 trainable hidden layer weights MLP S 1: subsampling layer ü Learnt by modified backpropagation 4 FMs 14 x 16 N 1 Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -6 -

Training with examples -1217 original motorbike images (provided examples from Cal. Tech and ETHZ) Artificially transformed (rotation, translation, scaling, adding noise, smoothing, …) 3906 motorbike examples Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -7 -

Training with examples -2ð Non-motorbike examples? ð Iterative training and bootstrapping Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -8 -

Training process 3508 initial negative examples 31518 grabbed false alarms 35026 negative examples Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -9 -

Searching for objects… 1 Produce a multi-scale Pyramid (factor 1. 2) 2 3 Convolve each pyramid image Positive answers correspond to candidate object locations Project the candidate objects back to the original image scale and fuse them 4 Apply the neural filter in a finer pyramid centered at each candidate object locations 5 Classify considering the local volume of positive answers (Thr. Vol ) Processing Speed (Pentium IV) : 6 fps (384 x 288) Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -10 -

Experimental Results ― Motorbikes Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -11 -

Experimental Results ― Cars Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -12 -

Experimental Results ― Motorbikes Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -13 -

Experimental Results ― Cars Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -14 -

Conclusion Ø Some problems: Ø Too few provided training examples to generalize well (notably in the case of cars). Ø Fixed ratio (retina) affects detection precision Ø Completely different views of objects Ø Advantages: Ø Very robust in terms of lighting, form and pose variations, noise, occlusions, cluttered background Ø Fast processing : parallel implementation, imaging systems Ø A general framework for computer vision detection tasks Stefan Duffner and Christophe Garcia, FT R&D, Pascal VOCC Workshop, Southampton, 11/04/05 -15 -