Generic Object Detection Brief Review and Our Approach

Generic Object Detection: Brief Review and Our Approach Xiaoyu Wang NEC Labs America

Outline o Review of Object Detection o Regionlets with a deep CNN o Conclusion 12/15/2021 Regionlets for Generic Object Detection 2

Generic Object detection task o Localize objects of interest person motorbike 12/15/2021 Regionlets for Generic Object Detection 3

Wide Applications o Autonomous driving Surveillance o Robotics Biometric 12/15/2021 Regionlets for Generic Object Detection 4

Generic object detection Positive Samples Feature Extraction Training Classification Model Learning Negative Samples Testing Input Image 12/15/2021 Feature Extraction Search Objects Using Learned Model Regionlets for Generic Object Detection Output Object Locations 5

Generic object detection Features? 12/15/2021 Regionlets for Generic Object Detection 6

Review of Previous Work o Popular Features: Haar feature (face detection) n Intensity difference between two/more pixel blocks n Scalar feature n Fast to compute 12/15/2021 Regionlets for Generic Object Detection 7

Review of Previous Work o Popular Features: Color Histogram n Quantize color into histogram bins n Vector feature n Very sensitive 12/15/2021 Regionlets for Generic Object Detection 8

Review of Previous Work o Popular Features: Histogram of Oriented Gradients (HOG) n Quantize gradient orientation into histogram bins n Robust to intensity change (gradient, normalized) 12/15/2021 Regionlets for Generic Object Detection 9

Review of Previous Work o Popular Features: Local Binary Patterns (LBP) n Binarized local pixel relations n Robust to intensity change n Well known for texture classification 12/15/2021 Regionlets for Generic Object Detection 10

Review of Previous Work o Popular Features: Scale Invariant Feature Transform (SIFT) n n n Rotate feature according to the dominant orientation Scale invariant Sparse feature (need feature detector) Hard to use in detection (repeatability issue) HOG is a dense version of SIFT without rotation 12/15/2021 Regionlets for Generic Object Detection 11

Review of Previous Work o Popular Features: Covariance feature n Covariance on vertical and horizontal gradients (first order, second order) n Capture spatial visual correlation n Very successful in pedestrian detection 12/15/2021 Regionlets for Generic Object Detection 12

Review of Previous Work o Popular Features: Deep CNN features n High order information n Robust and discriminative n Very successful in image classification 12/15/2021 Regionlets for Generic Object Detection 13

Review of Previous Work HAAR LBP How to feed classifiers with these features SIFT HOG ? Classifiers Covariance 12/15/2021 Regionlets for Generic Object Detection 14

Review of Previous Work o Concatenate features extracted from cells n HOG template 8 8 12/15/2021 Regionlets for Generic Object Detection 15

Review of Previous Work o Code features extracted from cells n Bag-of-Words, Sparse coding, Spatial Pyramid Matching Codebook (dictionary) Feature Vector 12/15/2021 Regionlets for Generic Object Detection 16

Review of Previous Work o Features are mined through object parts detector n Deformable part based model 12/15/2021 Regionlets for Generic Object Detection 17

Review of Previous Work o Most popular detection frameworks n Haar + Boosting, face detection, pedestrian detection n HOG + rigid template, Linear SVM, pedestrian detection n HOG + deformable template (DPM) + latent SVM, generic object detection 1 n Selective Search + Spatial Pyramid Matching, generic object detection 2 1. P. Felzenszwalb, et. al. A Discriminatively Trained, Multiscale, Deformable Part Model, CVPR 2008 2. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 12/15/2021 Regionlets for Generic Object Detection 18

Generic object detection Multiple Viewpoints Multiple Scales Rich Deformation 12/15/2021 Regionlets for Generic Object Detection 19

Review: Deformation handling o Deformable Part-based Model (DPM) n Specify the number of deformable parts o Spatial Pyramid Matching n Specify the number of pyramids to build o Do we have to pre-define model parameters to handle different degrees of deformation? 12/15/2021 Regionlets for Generic Object Detection 20

Review: Multi scales/viewpoints Size A Aspect ratio A Size B Aspect ratio B o DPM n Resize an image to detect objects at a fixed scale n Multiple models, each deals with one viewpoint o Spatial Pyramid Matching n No need to resize the image n One model, a codebook is used to encode features o Can we learn a model that can be easily adapted to arbitrary scales and viewpoints? 12/15/2021 Regionlets for Generic Object Detection 21

Motivation o Motivation: A flexible and general object-level representation with n Hassle free deformation handling n Arbitrary scales and aspect ratio handling 12/15/2021 Regionlets for Generic Object Detection 22

Detection framework 1. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 2. B. Alexe , et. al. Measuring the objectness of image windows. PAMI 2012 12/15/2021 Regionlets for Generic Object Detection 23

Regionlet: Definition o Figure 1 12/15/2021 Regionlets for Generic Object Detection 24

Regionlet: Definition(cont. ) o Relative normalized position Traditional Normalized (50, 180, 180) (. 25, . 90, . 90) Figure 2 12/15/2021 Regionlets for Generic Object Detection 25

Regionlet: Feature extraction Figure 3 Non-local pooling Could be SIFT, HOG, LBP , Covariance features, whatever feature your like! 12/15/2021 Regionlets for Generic Object Detection 26

Regionlets: Training o Constructing the regions/regionlets pool n Small region, fewer regionlets -> fine spatial layout n Large region, more regionlets -> robust to deformation o Learning real. Boost 1 cascades n 16 K region/regionlets candidates for each cascade n Learning of each cascade stops when the error rate is achieved (1% for positive, 37. 5% for negative) n Last cascade stops after collecting 5000 weak classifiers n Result in 4 -7 cascades n 2 -3 hours to finish training one category on a 8 -core machine 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR, 2004. 12/15/2021 Regionlets for Generic Object Detection 27

Regionlets: Testing o No image resizing o Any scale, any aspect ratio o Adapt the model size to the same size as the object candidate bounding box One model, resize image + Multiple models, original image + Ours, One model, original image + 12/15/2021 Regionlets for Generic Object Detection 28

Experiments o Datasets n PASCAL VOC 2007, 2010 o 20 object categories o Investigated Features n HOG n LBP n Covariance o Evaluation n True detecition > 0. 5 intersection/union n Reported number: mean Average precision(m. AP) 12/15/2021 Regionlets for Generic Object Detection 29

Experiments o Performance of different features 12/15/2021 Regionlets for Generic Object Detection 30

Experiments: PASCAL VOC Table 1. Performance on the PASCAL VOC 2007 dataset (Evaluated using Average Precision or mean Average Precision: m. AP, no DCNN feature, no outside data) Table 2: Performance comparison with state of the art 12/15/2021 Regionlets for Generic Object Detection 31

Experiments: PASCAL VOC 12/15/2021 Regionlets for Generic Object Detection 32

Running speed o 0. 2 second per image using a single core if candidate bounding boxes are given, real time(>30 frames per second) using 8 cores o 2 seconds per image to generate candidate bounding boxes, faster methods are available now o 2 -3 hours to finish training one category on a 8 -core machine 12/15/2021 Regionlets for Generic Object Detection 33

Regionlets with DCNN o DCNN is very successful in image classification n Deep structure learns high-level information n Max-pooling is robust to parts mis-alignment n Information are jointly learned o How to establish a bridge for DCNN and Regionlets object detection framework? 12/15/2021 Regionlets for Generic Object Detection 34

Regionlets with DCNN o Deep CNN structure n Features from convolution layers retain spatial information 12/15/2021 Regionlets for Generic Object Detection 35

Regionlets with DCNN o Deep CNN structure n Features from convolution layers retain spatial information 12/15/2021 Regionlets for Generic Object Detection 36

Regionlets with DCNN o Deep CNN structure n ‘Network-convolution’ to generate features for the whole image 12/15/2021 Regionlets for Generic Object Detection 37

Regionlets with DCNN o Deep CNN structure n Deep CNN features for Detection 12/15/2021 Regionlets for Generic Object Detection 38

Experiments: PASCAL VOC 12/15/2021 Regionlets for Generic Object Detection 39

Experiments: PASCAL VOC 12/15/2021 Regionlets for Generic Object Detection 40

Experiments: PASCAL VOC o Visualization of selected neuron patterns 12/15/2021 Regionlets for Generic Object Detection 41

Conclusions o A new object representation for object detection n Non-local max-pooling of regionlets n Relative normalized locations of regionlets n Flexibility to incorporate various types of features o A principled data-driven detection framework, effective in handling deformation, multiple scales, multiple viewpoints o Superior performance with a fast running speed 12/15/2021 Regionlets for Generic Object Detection 42

Thank you! 12/15/2021 Regionlets for Generic Object Detection 43