Statistical Learning of MultiView Face Detection Microsoft Research

  • Slides: 25
Download presentation
Statistical Learning of Multi-View Face Detection Microsoft Research Asia Stan Li, Long Zhu, Zhen

Statistical Learning of Multi-View Face Detection Microsoft Research Asia Stan Li, Long Zhu, Zhen Qiu Zhang, Andrew Blake, Hong Jiang Zhang, Harry Shum Presented by Derek Hoiem

Overview Ø Viola-Jones Ada. Boost Ø Float. Boost Approach Ø Multi-View Face Detection Ø

Overview Ø Viola-Jones Ada. Boost Ø Float. Boost Approach Ø Multi-View Face Detection Ø Float. Boost Results Ø Float. Boost vs. Ada. Boost Ø Float. Boost Discussion

Face Detection Overview Ø Evaluate windows at all locations in many scales Classifier Object

Face Detection Overview Ø Evaluate windows at all locations in many scales Classifier Object Non-Object

Viola-Jones Ada. Boost Ø Weak classifiers formed out of simple features Ø In sequential

Viola-Jones Ada. Boost Ø Weak classifiers formed out of simple features Ø In sequential stages, features are selected and weak classifiers trained with emphasis on misclassified examples Ø Integral images and a cascaded classifier allow real-time face detection

Viola-Jones Features For a 24 x 24 image: 190, 800 semi-continuous features Ø Computed

Viola-Jones Features For a 24 x 24 image: 190, 800 semi-continuous features Ø Computed in constant time using integral image Ø Weak classifiers consist of filter response threshold Ø Vertical Horizontal On-Off-On Diagonal

Integral Image y = I 8 – I 7– I 6 + I 5+

Integral Image y = I 8 – I 7– I 6 + I 5+ I 4 – I 3 – I 2 + I 1 I( x 1 , y 1 ) I( x 3, y 3 ) I( x 2, y 2 ) I( x 4, y 4 ) I( x 5, y 5 ) I( x 6, y 6 ) I( x 7, y 7 ) I( x 8, y 8 )

Cascade of Classifiers Input Signal (Image Window) 40% Stage 1 1 Weak Classifier 60%

Cascade of Classifiers Input Signal (Image Window) 40% Stage 1 1 Weak Classifier 60% 40% Stage 2 5 Weak Classifiers Class 2 (Non-Face) 60% 99. 999% … 0. 001% Stage N 1200 Weak Classifiers Class 1 (Face) 40%

Viola-Jones Ada. Boost Algorithm Ø Strong classifier formed from weak classifiers: Ø At each

Viola-Jones Ada. Boost Algorithm Ø Strong classifier formed from weak classifiers: Ø At each stage, new weak classifier chosen to minimize bound on classification error (confidence weighted): Ø This gives the form for our weak classifier:

Viola-Jones Ada. Boost Algorithm

Viola-Jones Ada. Boost Algorithm

Viola-Jones Ada. Boost Pros and Cons Ø Very fast Ø Moderately high accuracy Ø

Viola-Jones Ada. Boost Pros and Cons Ø Very fast Ø Moderately high accuracy Ø Simplementation/concept Ø Greedy search through feature space Ø Highly constrained features Ø Very high training time

Float. Boost Weak classifiers formed out of simple features Ø In each stage, the

Float. Boost Weak classifiers formed out of simple features Ø In each stage, the weak classifier that reduces error most is added Ø In each stage, if any previously added classifier contributes to error reduction less than the latest addition, this classifier is removed Ø Result is a smaller feature set with same classification accuracy Ø

MS Float. Boost Features Microsoft For a 20 x 20 image: over 290, 000

MS Float. Boost Features Microsoft For a 20 x 20 image: over 290, 000 features (~500 K ? ) Ø Computed in constant time using integral image Ø Weak classifiers consist of filter response threshold Ø Viola-Jones

Float. Boost Algorithm

Float. Boost Algorithm

Float. Boost Weak Classifiers Can be portrayed as density estimation on single variables using

Float. Boost Weak Classifiers Can be portrayed as density estimation on single variables using average shifted histograms with weighted examples Ø Each weak classifier is a 2 -bin histogram from weighted examples Ø Weights serve to eliminate overcounting due to dependent variables Ø Strong classifier is a combination of estimated weighted PDFs for selected features Ø

Multi-View Face Detection Head Rotations In-Plane Rotations: -45 to 45 degrees Out of Plane

Multi-View Face Detection Head Rotations In-Plane Rotations: -45 to 45 degrees Out of Plane Rotation: -90 to 90 degrees Moderate Nodding

Multi-View Face Detection Detector Pyramid

Multi-View Face Detection Detector Pyramid

Multi-View Face Detection Merging Results Frontal Right Side Left Side

Multi-View Face Detection Merging Results Frontal Right Side Left Side

Multi-View Face Detection Summary Ø Simple, rectangular features used Ø Float. Boost selects and

Multi-View Face Detection Summary Ø Simple, rectangular features used Ø Float. Boost selects and trains weak classifiers Ø A cascade of strong classifiers makes up the overall detector Ø A coarse-to-fine evaluation is used to efficiently find a broad range of out-ofplane rotated faces

Results: Frontal (MIT+CMU) Float. Boost/Ada. Boost/RBK Schneiderman 20 x 20 images Float. Boost Ø

Results: Frontal (MIT+CMU) Float. Boost/Ada. Boost/RBK Schneiderman 20 x 20 images Float. Boost Ø 3000 original faces, 6000 total Ø 100, 000 non-faces Ø Float. Boost vs. Adaboost

Results: MS Adaboost vs. Viola-Jones Adaboost More flexible features Ø Confidence-weighted Ada. Boost Ø

Results: MS Adaboost vs. Viola-Jones Adaboost More flexible features Ø Confidence-weighted Ada. Boost Ø Smaller image size Ø

Results: Profile No Quantitative Results!!!

Results: Profile No Quantitative Results!!!

Float. Boost vs. Ada. Boost Ø Float. Boost finds a more potent set of

Float. Boost vs. Ada. Boost Ø Float. Boost finds a more potent set of weak classifiers through a less greedy search Ø Float. Boost results in a faster, more accurate classifier Ø Float. Boost requires longer training times (5 times longer)

Float. Boost vs. Ada. Boost 1 Strong Classifier, 4000 objects, 4000 non-objects, 99. 5%

Float. Boost vs. Ada. Boost 1 Strong Classifier, 4000 objects, 4000 non-objects, 99. 5% fixed detection

Float. Boost: Pros Ø Very Fast Detection (5 fps multi-view) Ø Fairly High Accuracy

Float. Boost: Pros Ø Very Fast Detection (5 fps multi-view) Ø Fairly High Accuracy Ø Simple Implementation

Float. Boost: Cons Ø Very long training time Ø Not highest accuracy Ø Does

Float. Boost: Cons Ø Very long training time Ø Not highest accuracy Ø Does it work well for non-frontal faces and other objects?