Object Recognition Jeremy Wyatt Computational Vision Object Recognition
Object Recognition Jeremy Wyatt Computational Vision: Object Recognition
Plan § David Marr: the model based approach to vision § Model based approaches: Geons, Model Fitting § Appearance based approaches: PCA, SIFT, implicit shape model § Psychological Evidence: View dependent vs. view independent recognition § Summary: who is right? Computational Vision: Object Recognition
Model based vision § David Marr was a brilliant young British vision researcher who defined a coherent approach to the study of vision during the 1970 s § According to one tradition coming out of Marr’s work: • • • Vision is process of reconstructing the 3 d scene from 2 d information The vision system has representations of 3 d geometric structures Visual pipeline Intensity image • Primal sketch 2. 5 d sketch So selecting models and recovering their parameters from image data is a key task in vision Computational Vision: Object Recognition Model selection
Model based vision § There is an infinite variety of objects. How do we represent, store and access models of them efficiently? § One suggestion was the use of a small library of 3 d parts from which many complex models can be constructed § There are many schemes: generalised cylinders, Geons, Superquadrics § Vision researchers set about applying them Computational Vision: Object Recognition
Models vs Appearances § But they didn’t work very well … § By the early 1990 s people were experimenting with statistical techniques, e. g. PCA § These learn a statistical summary of the appearance of each view of an object Appearance Model Computational Vision: Object Recognition
Appearance based recognition: SIFT § These statistical approaches characterise some aspects of the appearance of an object that can be used to recognise it § But this means they are (largely) view dependent, you have to learn a different statistical model for each different view § e. g. SIFT based recognition (David Lowe, UBC) • • Find interest points in the scale space Re-describe the interest points so that they are robust to: § Image translation, scaling, rotation § Partially invariant to illumination changes, affine and 3 d projection changes Computational Vision: Object Recognition
Category level recognition Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Category level recognition Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Category level recognition Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Constellation model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Constellation Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Implicit Shape Model Computational Vision: Object Recognition (Thanks to Bastian Liebe)
Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts Aleš Leonardis and Sanja Fidler University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory Reproduced with permission Computational Vision: Object Recognition
Framework § Main properties of the framework: • Computational plausibility § Hierarchical representation § Compositionality (parts composed of parts) § Indexing & matching recognition scheme • Statistics driven learning (unsupervised learning) • Fast, incremental (continuous) learning Computational Vision: Object Recognition
Recognition: Indexing and matching car motorcycle dog person ag e hypotheses im LEARN Gradually limiting the search Computational Vision: Object Recognition verification
Overview of the architecture § Starts with simple, local features and learns more and more complex compositions § Learns layer after layer to exploit the regularities in natural images as efficiently and compactly as possible § Builds computationally feasible layers of parts by selecting only the most statistically significant compositions of specific granularity § Learns lower layers in a category independent way (to obtain optimally sharable parts) and category specific higher layers which contain only a small number of highly generalizable parts for each category § New categories can efficiently and continuously be added to the representation without the need to restructure the complete hierarchy § Implements parts in a robust, layered interplay of indexing & matching Computational Vision: Object Recognition
Part based appearance recognition (Fidler & Leonardis 07) Computational Vision: Object Recognition
Results § Learned hierarchy for faces and cars (first three layers are the same; links show compositionality for each of the categories; spatial variability of parts is not shown) Computational Vision: Object Recognition
Part based appearance recognition (Fidler & Leonardis 07) Computational Vision: Object Recognition
Results - Detections Computational Vision: Object Recognition
Results - Specific categories, faces § Detection of Layer 5 parts Computational Vision: Object Recognition
Results - Specific categories, faces Computational Vision: Object Recognition
Evidence from biology § Is human object recognition view dependent? § Shepherd & Miller § Pinker & Tarr § There is a quite a large body of experimental data that supports the view dependent camp. § Appearance based approaches fit neatly with this camp. Computational Vision: Object Recognition
Summary § This is not a resolved debate § There is evidence for both sides § Structural 3 d information is almost certainly extracted by the brain too § Model based: how do we extract good enough low level features (e. g. a depth map)? § Appearance based: only seems to be good for recognition, which is a small part of the vision problem. Computational Vision: Object Recognition
- Slides: 32