Object Recognition Szeliski Chapter 14 Recognition Simultaneous recognition
Object Recognition Szeliski Chapter 14
Recognition
Simultaneous recognition and segmentation (Shotton, Winn, Rother et al. 2009) Recognition Location recognition (Philbin, Chum, Isard et al. 2007) Using context (Russell, Torralba, Liu et al. 2007).
Recognition problems What is it? • Object and scene recognition Who is it? • Identity recognition Where is it? • Object detection What are they doing? • Activities All of these are classification problems • Choose one class from a list of possible candidates Recognition Slides from Rick Szeliski
What is recognition? A different taxonomy from [Csurka et al. 2006]: • Recognition • Where is this particular object? • Categorization • What kind of object(s) is(are) present? • Content-based image retrieval • Find me something that looks similar • Detection • Locate all instances of a given class Recognition Slides from Rick Szeliski
Readings • Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition Fergus, R. , Perona, P. and Zisserman, A. International Journal of Computer Vision, Vol. 71(3), 273 -303, March 2007 • MIT course • http: //people. csail. mit. edu/torralba/courses/6. 870/6. 8 70. recognition. htm Recognition Slides from Rick Szeliski
Sources • Steve Seitz, CSE 455/576, previous quarters • Fei-Fei, Fergus, Torralba, CVPR’ 2007 course • Efros, CMU 16 -721 Learning in Vision • Freeman, MIT 6. 869 Computer Vision: Learning • Linda Shapiro, CSE 576, Spring 2007 Recognition Slides from Rick Szeliski
Object Detection Every possible sub-window? Effective special-purpose detectors: rapidly find likely regions where particular objects might occur. How to recognize each person in this image? Recognition
Object Detection • Face Detection • More successful • Built in • digital cameras to enhance auto-focus • Video conference to control pan-tilt heads • … • General object detection • Pedestrian • Cars. Recognition
Face Detection • Every pixel? Scale? • Too slow in practice • Tutorials • http: //people. csail. mit. edu/torralba/short. Course. RLOC/ • General detection and recognition • http: //vision. ai. uiuc. edu/mhyang/face-detection-survey. html • Face recognition Recognition
Face detection How to tell if a face is present? CSE 576, Spring 2008 Face Recognition and Detection 12
Skin detection skin Skin pixels have a distinctive range of colors • Corresponds to region(s) in RGB color space Skin classifier • A pixel X = (R, G, B) is skin if it is in the skin (color) region • How to find this region? CSE 576, Spring 2008 Face Recognition and Detection 13
Skin detection Learn the skin region from examples • Manually label skin/non pixels in one or more “training images” • Plot the training data in RGB space – skin pixels shown in orange, non-skin pixels shown in gray – some skin pixels may be outside the region, non-skin pixels inside. CSE 576, Spring 2008 Face Recognition and Detection 14
Skin classifier Given X = (R, G, B): how to determine if it is skin or not? • Nearest neighbor – find labeled pixel closest to X • Find plane/curve that separates the two classes – popular approach: Support Vector Machines (SVM) • Data modeling CSE 576, Spring – fit 2008 Recognition and Detection a probability. Face density/distribution model to each class 15
Probability • X is a random variable • P(X) is the probability that X achieves a certain value called a PDF -probability distribution/density function -a 2 D PDF is a surface -3 D PDF is a volume continuous X CSE 576, Spring 2008 discrete X Face Recognition and Detection 16
Probabilistic skin classification Model PDF / uncertainty • Each pixel has a probability of being skin or not skin Skin classifier • Given X = (R, G, B): how to determine if it is skin or not? • Choose interpretation of highest probability Where do we get CSE 576, Spring 2008 and Face Recognition and Detection ? 17
Learning conditional PDF’s We can calculate P(R | skin) from a set of training images • It is simply a histogram over the pixels in the training images • each bin Ri contains the proportion of skin pixels with color Ri • This doesn’t work as well in higher-dimensional spaces. Why not? Approach: fit parametric PDF functions • common choice is rotated Gaussian – center – covariance CSE 576, Spring 2008 Face Recognition and Detection 18
Learning conditional PDF’s We can calculate P(R | skin) from a set of training images But this isn’t quite what we want • Why not? How to determine if a pixel is skin? • We want P(skin | R) not P(R | skin) • How can we get it? CSE 576, Spring 2008 Face Recognition and Detection 19
Bayes rule what we measure (likelihood) In terms of our problem: what we want (posterior) domain knowledge (prior) normalization term What can we use for the prior P(skin)? • Domain knowledge: – P(skin) may be larger if we know the image contains a person – For a portrait, P(skin) may be higher for pixels in the center • Learn the prior from the training set. How? CSE 576, Spring Face Recognition and Detection – P(skin) is proportion of skin pixels in training set 2008 20
Bayesian estimation likelihood posterior (unnormalized) Bayesian estimation • Goal is to choose the label (skin or ~skin) that maximizes the posterior ↔ minimizes probability of misclassification – this is called Maximum A Posteriori (MAP) estimation CSE 576, Spring 2008 Face Recognition and Detection 21
Skin detection results CSE 576, Spring 2008 Face Recognition and Detection 22
General classification This same procedure applies in more general circumstances • More than two classes • More than one dimension Example: face detection • Here, X is an image region – dimension = # pixels – each face can be thought of as a point in a high dimensional space CSE 576, Spring 2008 H. Schneiderman, T. Kanade. "A Statistical Method for 3 D Face Recognition and Detection 23 2000 Object Detection Applied to Faces and Cars". CVPR
Face Detection • Feature-based • Distinctive image features: eyes, nose, mouth • Geometrical arrangement • Template-based • Active shape model (ASM), active appearance model (AAM) • Requires good initialization near a real face • Appearance-based • Search for likely candidates • Refine using cascade of more expensive but selective detection algorithm • Rely heavily on training classifiers using labeled faces Recognition
Recognition Slides from Rick Szeliski
CVPR 2007 Minneapolis, Short Course, June 17 Recognizing and Learning Object Categories: Year 2007 Li Fei-Fei, Princeton Rob Fergus, MIT Antonio Torralba, MIT (see other slide deck)
Today’s lecture • Known object recognition [Lowe] • Bag of keypoints [Csurka etc. ] • Location recognition [Schindler et al. ] • Deformable object/category recognition [Fergus et al. ] • Recognition by segmentation Recognition Slides from Rick Szeliski
Single object recognition Recognition Slides from Rick Szeliski
Single object recognition • • • Lowe, et al. 1999, 2003 Mahamud and Herbert, 2000 Ferrari, Tuytelaars, and Van Gool, 2004 Rothganger, Lazebnik, and Ponce, 2004 Moreels and Perona, 2005 … Recognition Slides from Rick Szeliski
Planar object recognition [Lowe] • Use SIFT features • Verify affine (or homography) geometric alignment Recognition Slides from Rick Szeliski
Planar object recognition [Lowe] • Use SIFT features • Verify affine (or homography) geometric alignment Recognition Slides from Rick Szeliski
3 D object recognition [Lowe] • Extract object outlines with background subtraction Recognition Slides from Rick Szeliski
3 D object recognition [Lowe] • Use 3 matches to recognize • Use additional matches for verification • Tolerant to occlusions Recognition Slides from Rick Szeliski
Feature-based recognition How can we scale to millions of objects? Comparison to all stored objects/features is infeasible. Answer: • quantize features into words [Csurka et al. 04] • use information retrieval (inverted index) • use metric tree for faster quantization (NN) [Nister & Stewenius 05] Recognition Slides from Rick Szeliski
Today’s lecture • Known object recognition [Lowe] • Bag of keypoints [Csurka etc. ] • Location recognition [Schindler et al. ] • Deformable object/category recognition [Fergus et al. ] • Recognition by segmentation Recognition Slides from Rick Szeliski
CVPR 2007 Minneapolis, Short Course, June 17 (see other slide deck) Part 1: Bag-of-words models by Li Fei-Fei (Princeton)
Today’s lecture • Known object recognition [Lowe] • Bag of keypoints [Csurka etc. ] • Location recognition [Schindler et al. ] • Deformable object/category recognition [Fergus et al. ] • Recognition by segmentation Recognition Slides from Rick Szeliski
How to scale to 106 s of images? Make “word” generation even more efficient: “Vocabulary tree” Recognition Slides from Rick Szeliski
Scalable Recognition with a Vocabulary Tree David Nistér, Henrik Stewénius Recognition Slides from Rick Szeliski
Vocabulary Tree Recognition Slides from Rick Szeliski
Recognition Slides from Rick Szeliski
Performance Recognition Slides from Rick Szeliski
http: //vis. uky. edu/~stewe/ukbench/ Recognition Slides from Rick Szeliski
Location Recognition Can we apply this to recognizing your location from a cell-phone photo? Recognition Slides from Rick Szeliski
City-Scale Location Recognition Grant Schindler, Matthew Brown, and Richard Szeliski CVPR’ 2007
The Problem Recognition Slides from Rick Szeliski
Main idea Find N-best matches in vocabulary tree Recognition Slides from Rick Szeliski
Other ideas • Use only informative features (ignore trees…) • Integrate matches with adjacent (streetside) neighbors Recognition Slides from Rick Szeliski
Today’s lecture • Known object recognition [Lowe] • Bag of keypoints [Csurka etc. ] • Location recognition [Schindler et al. ] • Deformable object/category recognition [Fergus et al. ] • Recognition by segmentation Recognition Slides from Rick Szeliski
CVPR 2007 Minneapolis, Short Course, June 17 (see other slide deck) Part 2: part-based models by Rob Fergus (MIT)
Today’s lecture • Known object recognition [Lowe] • Bag of keypoints [Csurka etc. ] • Location recognition [Schindler et al. ] • Deformable object/category recognition [Fergus et al. ] • Recognition by segmentation Recognition Slides from Rick Szeliski
CVPR 2007 Minneapolis, Short Course, June 17 Part 4: Combined segmentation and recognition by Rob Fergus (MIT)
Aim Given an image and object category, segment the object Object Category Model Segmentation Cow Image Segmented Cow Segmentation should (ideally) be • shaped like the object e. g. cow-like • obtained efficiently in an unsupervised manner • able to handle self-occlusion Recognition Slides from Rick Szeliski Slide from Kumar ‘ 05
Implicit Shape Model - Liebe and Schiele, 2003 Interest Points Matched Codebook Entries Probabilistic Voting Space (continuous) Segmentation Refined Hypotheses (uniform sampling) Recognition Backprojected Hypotheses Slides from Rick Szeliski Backprojection of Maxima
Other topics: context (scenes) Antonio Torralba, Contextual Priming for Object Detection, Recognition No. 2, July 2003, Slides Rick Szeliski IJCV(53), pp. from 169 -191
New work: tiny images Recognition Slides from Rick Szeliski
CVPR 2007 Minneapolis, Short Course, June 17 (see other slide deck) Datasets and object collections
Summary of object recognition • Known object recognition [Lowe] • Bag of keypoints [Csurka etc. ] • Location recognition [Schindler et al. ] • Deformable object/category recognition [Fergus et al. ] • Recognition by segmentation • Context and scenes Recognition Slides from Rick Szeliski
- Slides: 57