Object Recognizing Object Classes Individual Recognition Object parts

Object Recognizing

Object Classes

Individual Recognition

Object parts Full Interpretation Window Mirror Window Door knob Headlight Back wheel Bumper Headlight Front wheel

Action recognition (except 2)

Class Non-class

Is this an airplane?

Features and Classifiers Same features with different classifiers Same classifier with different features

Generic Features Simple (wavelets) Complex (Geons)

Marr-Nishihara

Mental Rotation

3 -D Parts • Implementations – poor results • View-specific recognition • f. MRI studies • Instead: Using image patches

Class-specific Features: Common Building Blocks

Optimal Class Components? • Large features are too rare • Small features are found everywhere Find features that carry the highest amount of information

Mutual information H(C) F=1 H(C) when F=1 F=0 H(C) when F=0 I(C; F) = H(C) – H(C/F)

Mutual Information I(C, F) Class: 1 1 0 1 0 0 Feature: 1 0 0 1 1 1 0 0 I(F, C) = H(C) – H(C|F)

Horse-class features Car-class features Pictorial features Learned from examples

Star model Detected fragments ‘vote’ for the center location Find location with maximal vote In variations, a popular state-of-the art scheme

Recognition Features in the Brain

f. MRI Functional Magnetic Resonance Imaging

LO object recognition V 1 early processing

Class-fragments and Activation Malach et al 2008

Bag of words

Bag of visual words A large collection of image patches –

Each class has its words historgram – – – Limited or no Geometry Simple and popular Visual words are used, but not for full recognition model

Ho. G Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection

• • • SIFT: Scale-invariant Feature Transform MSER: Maximally Stable Extremal Regions SURF: Speeded-up Robust Features Cross correlation …. • Ho. G and SIFT are the most widely used.

DPM Felzenszwalb • Felzenszwalb, Mc. Allester, Ramanan CVPR 2008. A Discriminatively Trained, Multiscale, Deformable Part Model • Many implementation details, will describe the main points.

Ho. G descriptor

Using patches with Ho. G descriptors and classification by SVM Person model: Ho. G

Object model using Ho. G A bicycle and its ‘root filter’ The root filter is a patch of Ho. G descriptor Image is partitioned into 8 x 8 pixel cells In each block we compute a histogram of gradient orientations

Dealing with scale: multi-scale analysis The filter is searched on a pyramid of Ho. G descriptors, to deal with unknown scale

Adding Parts A part Pi = (Fi, vi, si, ai, bi). Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a 1 ∆ x + a 2 ∆y + b 1 ∆x 2 + b 2 ∆y 2

Bicycle model: root, parts, spatial map Person model