Recursive Composition in Computer Vision Leo Zhu CSAIL

Ideas behind Recursive Composition � How to deal with image complexity � A general

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization �Learning •

Model Deformable Object �Flat MRF • Nodes: object parts • Edges: spatial relations �Limitations:

Recursive Compositional Models: RCM-1 x: image y: (position, scale, orientation) graph=(nodes, edges) a: index

RCM-1: the Recursive Formula Recursion x: image ; y: (position, scale, orientation); Vertical independency;

Polynomial-time Inference � Inference task: � Recursive Optimization: Recursion � Polynomial-time Complexity: 9

Supervised Learning �Supervised learning • Perceptron algorithm (MLE, max margin – svm) • Parameter

Supervised learning by Perceptron Algorithm � Goal: � Input: where a set of training

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning

RCM-1: Multi-level Potentials � Potentials for appearance [ Gabor, Edge, = …] * 13

RCM-1: Multi-level Potentials � Potentials for shape: triplet descriptors (position, scale, orientation) 14

The Inference Results after Supervised Learning 15

Evaluations: Segmentation and Parsing � Segmentation (Accuracy of pixel labeling) • The proportion of

Recursive Composition �Modeling: (Representation) • Recursive Compositional Models (RCMs) �Inference: (Computing) • Recursive Optimization

Unsupervised Learning � Task: given 10 training images, no labeling, no alignment, highly ambiguous

Recursive Dictionary Learning � Multi-level dictionary (layer-wise greedy) � Bottom-Up and Top-Down recursive procedure

Bottom-up Learning Suspicious Coincidence Composition Clustering Competitive Exclusion 24

The Dictionary: From Generic Parts to Object Structures � Unified representation (RCMs) and learning

Dictionary Size, Part Sharing and Computational Complexity Level Composition Clusters Suspicious Coincidence More Sharing

Top-down refinement � Fill in missing parts � Examine every node from top to

Evaluations of Unsupervised Learning Methods Testing Segmentation Unsupervised 316 93. 3 Supervised 228 94.

Scale up the System: Issue I �More classes/viewpoints -> more training/detection cost 33

Scale up the System: Issue II �No enough data for rare viewpoints/classes 34

Our Strategy �Joint multi-class multi-view learning �Appearance sharing �Part sharing 35

Joint Multi-Class Multi-View Learning � 120 templates: 5 viewpoints & 26 classes 36

Different Viewpoints Share same appearance 37

Dense Part Sharing at Low Levels: Layer-2 40

Sparse Part Sharing at High Levels: Layer-4 42

The more classes/viewpoints, the more amount of part sharing 44

RCM-2 for Articulated Object: Horses multiple poses y=(switch, position, scale, orientation) Composition Switch 47

Image Scene Parsing �Task: Image Segmentation and Labeling 51

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Lack of long-range

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Joint segmentation-recognition template

Segmentation and Recognition Template � (segmentation, object) pair: chicken-and-egg of segmentation and recognition. �

RCM-3 for Scene Parsing Recursion y=(segmentation, object) f: appearance likelihood g: object layout prior

RCM-3: Inference and Learning � State space: • C=21 classes; D=30 templates; • K=3

Evaluations of RCM-3 �Implementation Details Dataset Classes Size Training Time Testing Time MSRC 21

Unified RCMs: Object vs. Scene RCM-1 Triplets of Parts Segments Boundary only RCM-2 RCM-3

Conclusions � Principle: Recursive Composition • Composition -> complexity decomposition • Recursion -> Universal

References � � � � Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, Alan.

Slides: 61

Download presentation

Recursive Composition in Computer Vision Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1

Ideas behind Recursive Composition � How to deal with image complexity � A general framework for different vision tasks � Rich representation and tractable computation Pattern Theory. Grenander 94 Compositionality. Geman 02, 06 Stochastic Grammar. Zhu and Mumford 06 2

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization �Learning • Supervised Parameter Estimation • Unsupervised Recursive Dictionary Learning �RCM-1: Deformable Object �RCM-2: Articulated Object �RCM-3: Scene (Entire Image) 3

Model Deformable Object �Flat MRF • Nodes: object parts • Edges: spatial relations �Limitations: • Short range interaction • Sparse 4

Recursive Composision 5

Recursive Compositional Models: RCM-1 x: image y: (position, scale, orientation) graph=(nodes, edges) a: index of node b: child of a f: appearances on node a g: potentials on edges (a, b) 6

RCM-1: the Recursive Formula Recursion x: image ; y: (position, scale, orientation); Vertical independency; Self-similarity; 7

Polynomial-time Inference � Inference task: � Recursive Optimization: Recursion � Polynomial-time Complexity: 9

Supervised Learning �Supervised learning • Perceptron algorithm (MLE, max margin – svm) • Parameter estimation needs fast inference. Collins 02. Taskar et al. 04 10

Supervised learning by Perceptron Algorithm � Goal: � Input: where a set of training images with ground truth. Initialize parameter vector. � Training algorithm (Collins 02): Loop over training samples: i = 1 to N Step 1: find the best using inference: Step 2: Update the parameters: End of Loop. Inference is critical for learning 11

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning • Supervised Parameter Estimation �RCM-1: Deformable Object 12

RCM-1: Multi-level Potentials � Potentials for appearance [ Gabor, Edge, = …] * 13

RCM-1: Multi-level Potentials � Potentials for shape: triplet descriptors (position, scale, orientation) 14

The Inference Results after Supervised Learning 15

Segmentation Results 17

Evaluations: Segmentation and Parsing � Segmentation (Accuracy of pixel labeling) • The proportion of the correct pixel labels (object or non- object) � Parsing (Average Position Error of matching) • The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree Methods Testing Segmentation Parsing Speed RCM-1 228 94. 7 16 23 s Ren (Berkeley) 172 91 Winn (LOCUS) 200 93 Levin and Weiss N/A 95 Kumar (OBJ CUT) 5 96 18

Recursive Composition �Modeling: (Representation) • Recursive Compositional Models (RCMs) �Inference: (Computing) • Recursive Optimization (Polynomial-time) �Learning: • Supervised Parameter Estimation • Unsupervised Recursive Learning �RCM-1: deformable object 20

Unsupervised Learning � Task: given 10 training images, no labeling, no alignment, highly ambiguous features. • Induce the structure (nodes and edges) • Estimate the parameters. Correspondence is unknown ? Combinatorial Explosion problem 21

Recursive Dictionary Learning � Multi-level dictionary (layer-wise greedy) � Bottom-Up and Top-Down recursive procedure � Three Principles: • Recursive Composition • Suspicious Coincidence • Competitive Exclusion Recursion Barlow 94. 22

10 images for training 23

Bottom-up Learning Suspicious Coincidence Composition Clustering Competitive Exclusion 24

The Dictionary: From Generic Parts to Object Structures � Unified representation (RCMs) and learning � Bridge the gap between the generic features and specific object structures 25

Dictionary Size, Part Sharing and Computational Complexity Level Composition Clusters Suspicious Coincidence More Sharing 0 Competitive Exclusion Seconds 4 1 1 167, 431 14, 684 262 48 117 2 2, 034, 851 741, 662 995 116 254 3 2, 135, 467 1, 012, 777 305 53 99 4 236, 955 72, 620 30 2 9 26

Top-down refinement � Fill in missing parts � Examine every node from top to bottom 30

Evaluations of Unsupervised Learning Methods Testing Segmentation Unsupervised 316 93. 3 Supervised 228 94. 7 Parsing Speed 17 s 16 23 s 32

Scale up the System: Issue I �More classes/viewpoints -> more training/detection cost 33

Scale up the System: Issue II �No enough data for rare viewpoints/classes 34

Our Strategy �Joint multi-class multi-view learning �Appearance sharing �Part sharing 35

Joint Multi-Class Multi-View Learning � 120 templates: 5 viewpoints & 26 classes 36

Different Viewpoints Share same appearance 37

Different Classes Share Common Parts 38

Compact Hierarchical Dictionary 39

Dense Part Sharing at Low Levels: Layer-2 40

Less Part Sharing: Layer-3 41

Sparse Part Sharing at High Levels: Layer-4 42

Re-usable Parts: All Layers 43

The more classes/viewpoints, the more amount of part sharing 44

Multi-View Single Class Performance 45

RCM-2 for Articulated Object: Horses multiple poses y=(switch, position, scale, orientation) Composition Switch 47

RCM-2 for Human Body 48

Image Scene Parsing �Task: Image Segmentation and Labeling 51

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Lack of long-range interactions. � Lack of region-level properties. � High-order potentials -> heavy computation Geman and Geman 84. L Zhu et al. NIPS 08 52

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Joint segmentation-recognition template Geman and Geman 84. L Zhu et al. NIPS 08 53

Segmentation and Recognition Template � (segmentation, object) pair: chicken-and-egg of segmentation and recognition. � Multi-level low-dimensional abstraction coarse to fine Global: gist of scene object layout Local: concurrent shape and appearance 54

RCM-3 for Scene Parsing Recursion y=(segmentation, object) f: appearance likelihood g: object layout prior homogeneity layer-wise consistency object texture color object cooccurrence Horse Grass segmentation prior 55

RCM-3: Inference and Learning � State space: • C=21 classes; D=30 templates; • K=3 classes / per template � Inference (recursive optimization): � Supervised learning (perceptron ) 56

Evaluations of RCM-3 �Implementation Details Dataset Classes Size Training Time Testing Time MSRC 21 45% 55 h 30 s 591 �Comparisons Texton. Boost Shotton et al. 04 PLSA-MRF Berbeek and Trigg Auto. Context Classifier Tu 08 only RCM-3 Average 57. 7 64 68 67. 2 74. 5 Global 73. 5 77. 7 75. 9 81. 4 72. 2 69 (Classifier) 59

Unified RCMs: Object vs. Scene RCM-1 Triplets of Parts Segments Boundary only RCM-2 RCM-3 Triplets of Region + Boundary 60

Conclusions � Principle: Recursive Composition • Composition -> complexity decomposition • Recursion -> Universal rules (self-similarity) • Recursion and Composition -> sparseness � One formula for different tasks. � Key: the representation of visual patterns, i. e. y. � Low dimension, simple potentials � Scaling up: practical Image Understanding System 61

References � � � � Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, Alan. Yuille. Part and Appearance Sharing: Recursive Compositional Models for Multi. View Multi-Object Detection. CVPR. 2010. Long Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2 D Parsing. NIPS 2008. Long Zhu, Chenxi Lin, Haoda Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008. Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille. Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008. Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille, Hongjiang Zhang. Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing. NIPS 2007. Long Zhu, Alan L. Yuille. A Hierarchical Compositional System for Rapid Object Detection. NIPS 2005 62