Visual Grouping and Object Recognition Jitendra Malik U

Visual Grouping and Object Recognition Jitendra Malik* U. C. Berkeley * with S. Belongie, C. Fowlkes, T. Leung, D. Martin, G. Mori, J. Puzicha, J. Shi, X. Ren University of California Berkeley Computer Vision Group

From images/video to objects Labeled sets: tiger, grass etc University of California Berkeley Computer Vision Group

University of California Berkeley Computer Vision Group

Consistency A Perceptual organization forms a tree: Image BG B C L-bird grass bush far beak body beak eye head • A, C are refinements of B • A, C are mutual refinements • A, B, C represent the same percept • Attention accounts for differences University of California Berkeley R-bird body eye head Two segmentations are consistent when they can be explained by the same segmentation tree (i. e. they could be derived from a single perceptual organization). Computer Vision Group

Outline • Finding boundaries • Recognizing objects • Recognizing actions University of California Berkeley Computer Vision Group

Finding boundaries: Is texture a problem or a solution? image University of California Berkeley orientation energy Computer Vision Group

Statistically optimal contour detection • Use humans to segment a large collection of natural images. • Train a classifier for the contour/non-contour classification using orientation energy and texture gradient as features. University of California Berkeley Computer Vision Group

Orientation Energy • Gaussian 2 nd derivative and its Hilbert pair • • Can detect combination of bar and edge features [Perona & Malik 90] University of California Berkeley Computer Vision Group

Texture gradient = Chi square distance between texton histograms in half disks across edge Chi-square i j k University of California Berkeley 0. 1 0. 8 Computer Vision Group

University of California Berkeley Computer Vision Group

ROC curve for local boundary detection University of California Berkeley Computer Vision Group

Outline • Finding boundaries • Recognizing objects • Recognizing actions University of California Berkeley Computer Vision Group

Biological Shape • D’Arcy Thompson: On Growth and Form, 1917 – studied transformations between shapes of organisms University of California Berkeley Computer Vision Group

Deformable Templates: Related Work • Fischler & Elschlager (1973) • Grenander et al. (1991) • von der Malsburg (1993) University of California Berkeley Computer Vision Group

. . . Matching Framework model target • Find correspondences between points on shape • Fast pruning • Estimate transformation & measure similarity University of California Berkeley Computer Vision Group

Comparing Pointsets University of California Berkeley Computer Vision Group

Shape Context Count the number of points inside each bin, e. g. : Count = 4. . . Count = 10 F Compact representation of distribution of points relative to each point University of California Berkeley Computer Vision Group

Shape Context University of California Berkeley Computer Vision Group

Comparing Shape Contexts Compute matching costs using Chi Squared distance: Recover correspondences by solving linear assignment problem with costs Cij [Jonker & Volgenant 1987] University of California Berkeley Computer Vision Group

. . . Matching Framework model target • Find correspondences between points on shape • Fast pruning • Estimate transformation & measure similarity University of California Berkeley Computer Vision Group

Fast pruning • Find best match for the shape context at only a few random points and add up cost University of California Berkeley Computer Vision Group

. . . Matching Framework model target • Find correspondences between points on shape • Fast pruning • Estimate transformation & measure similarity University of California Berkeley Computer Vision Group

Thin Plate Spline Model • 2 D counterpart to cubic spline: • Minimizes bending energy: • Solve by inverting linear system • Can be regularized when data is inexact Duchon (1977), Meinguet (1979), Wahba (1991) University of California Berkeley Computer Vision Group

Matching Example model University of California Berkeley target Computer Vision Group

Outlier Test Example University of California Berkeley Computer Vision Group

Object Recognition Experiments • Handwritten digits • COIL 3 D objects (Nayar-Murase) • Human body configurations • Trademarks University of California Berkeley Computer Vision Group

Terms in Similarity Score • Shape Context difference • Local Image appearance difference – orientation – gray-level correlation in Gaussian window – … (many more possible) • Bending energy University of California Berkeley Computer Vision Group

Handwritten Digit Recognition • MNIST 60 000: – – – – linear: 12. 0% 40 PCA+ quad: 3. 3% 1000 RBF +linear: 3. 6% K-NN: 5% K-NN (deskewed): 2. 4% K-NN (tangent dist. ): 1. 1% SVM: 1. 1% Le. Net 5: 0. 95% University of California Berkeley • MNIST 600 000 (distortions): – Le. Net 5: 0. 8% – SVM: 0. 8% – Boosted Le. Net 4: 0. 7% • MNIST 20 000: – K-NN, Shape Context matching: 0. 63% Computer Vision Group

University of California Berkeley Computer Vision Group

COIL Object Database University of California Berkeley Computer Vision Group

Prototypes Selected for 2 Categories Details in Belongie, Malik & Puzicha (NIPS 2000) University of California Berkeley Computer Vision Group

Error vs. Number of Views University of California Berkeley Computer Vision Group

Human body configurations University of California Berkeley Computer Vision Group

Deformable Matching • Kinematic chain-based deformation model • Use iterations of correspondence and deformation • Keypoints on exemplars are deformed to locations on query image University of California Berkeley Computer Vision Group

Results University of California Berkeley Computer Vision Group

Trademark Similarity University of California Berkeley Computer Vision Group

Recognizing objects in scenes University of California Berkeley Computer Vision Group

Outline • Finding boundaries • Recognizing objects • Recognizing actions University of California Berkeley Computer Vision Group

Examples of Actions • Movement and posture change – run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … • Object manipulation – pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… • Conversational gesture – point, … • Sign Language University of California Berkeley Computer Vision Group

Key cues for action recognition • “Morpho-kinesics” of action (shape and movement of the body) • Identity of the object/s • Activity context University of California Berkeley Computer Vision Group

Image/Video Stick figure Action • Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom) – 2 D joint positions – 3 D joint positions – Joint angles • Complete representation • Evidence that it is effectively computable University of California Berkeley Computer Vision Group

Tracking by Repeated Finding University of California Berkeley Computer Vision Group

Achievable goals in 3 years • Reasonable competence at object recognition at crude category level (~1000) • Detection/Tracking of humans as kinematic chains, assuming adequate resolution. • Recognition of ~10 -100 actions and compositions thereof. University of California Berkeley Computer Vision Group