Robust Realtime Object Detection Paul Viola Michael Jones
Robust Real-time Object Detection Paul Viola Michael Jones SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTANIONAL THEORIES OF VISION – MODELING, LEARNING, COMPUTING AND SAMPLING VANCOUVER, CANADA, JULY 13, 2001. Aluna: Lourdes Ramírez Cerna. 1
Introduction Face recognition has become an area of active research, that spans disciplines such as image processing, pattern recognition, computer vision, neural networks and so on. The first step in a face recognition system is the face detection. Given an image or video, a face identifier must be able to identify and locate all faces regardless their position, scale, age, orientarion and lighting conditions. Lighting conditions Orientation Scale 2
3
The Problem There are hundred detection methods in the literature, but many of them don’t work in real-time so the method proposed by Viola-Jones was the first real-time robust detection system. This paper presents new algorithms to construct a framework for robust and extremely rapid object detection, which achieves detection and false positive rates equivalent to the best published results. 4
Framework Scheme Consists in two steps: Trainer: works with positive (objects with faces) and negative (objects without faces) samples. It’s a lengthy process to be calculated. 2. Detector: uses the trainer detector to analyze each input image. This second stage is very fast and allows real-time detection. 1. 5
Features The object detection procedure classifies images based on the value of simple features called Haar-like Features. A feature-based system operates much faster than a pixel -based system. 6
Integral Image Rectangle features can be computed very rapidly using an intermediate representation for the image which call the integral image. Integral Image Calculate rectangular feature 7
Training The Attentional Cascade 8
9
10
11
Detection 12
Experiments The positive training set consisted of 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels. q And 10 000 negative set examples were selected by randomly picking sub windows from 9500 images which didn’t contain faces. q 13
The speed of the cascaded detector is directly related to the number of features evaluated per scanned sub window. q The final classifier had 32 layers and 4297 features total. q Evaluated on the MIT-CMU test set an average of 8 features out of a total of 4297 are evaluated per subwindow. q The processing time of a 384 by 288 pixel image on a conventional personal computer about 0. 067 seconds. q 14
Results Testing of the final face detector was performed using the MIT CMU frontal face test which consists of: � 130 images � 507 frontal faces. 15
- Slides: 16