Recursive Composition in Computer Vision Leo Zhu CSAIL

  • Slides: 61
Download presentation
Recursive Composition in Computer Vision Leo Zhu CSAIL MIT Joint work with Chen, Yuille,

Recursive Composition in Computer Vision Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1

Ideas behind Recursive Composition � How to deal with image complexity � A general

Ideas behind Recursive Composition � How to deal with image complexity � A general framework for different vision tasks � Rich representation and tractable computation Pattern Theory. Grenander 94 Compositionality. Geman 02, 06 Stochastic Grammar. Zhu and Mumford 06 2

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization �Learning •

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization �Learning • Supervised Parameter Estimation • Unsupervised Recursive Dictionary Learning �RCM-1: Deformable Object �RCM-2: Articulated Object �RCM-3: Scene (Entire Image) 3

Model Deformable Object �Flat MRF • Nodes: object parts • Edges: spatial relations �Limitations:

Model Deformable Object �Flat MRF • Nodes: object parts • Edges: spatial relations �Limitations: • Short range interaction • Sparse 4

Recursive Composision 5

Recursive Composision 5

Recursive Compositional Models: RCM-1 x: image y: (position, scale, orientation) graph=(nodes, edges) a: index

Recursive Compositional Models: RCM-1 x: image y: (position, scale, orientation) graph=(nodes, edges) a: index of node b: child of a f: appearances on node a g: potentials on edges (a, b) 6

RCM-1: the Recursive Formula Recursion x: image ; y: (position, scale, orientation); Vertical independency;

RCM-1: the Recursive Formula Recursion x: image ; y: (position, scale, orientation); Vertical independency; Self-similarity; 7

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization �Learning •

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization �Learning • Supervised Parameter Estimation • Unsupervised Recursive Dictionary Learning 8

Polynomial-time Inference � Inference task: � Recursive Optimization: Recursion � Polynomial-time Complexity: 9

Polynomial-time Inference � Inference task: � Recursive Optimization: Recursion � Polynomial-time Complexity: 9

Supervised Learning �Supervised learning • Perceptron algorithm (MLE, max margin – svm) • Parameter

Supervised Learning �Supervised learning • Perceptron algorithm (MLE, max margin – svm) • Parameter estimation needs fast inference. Collins 02. Taskar et al. 04 10

Supervised learning by Perceptron Algorithm � Goal: � Input: where a set of training

Supervised learning by Perceptron Algorithm � Goal: � Input: where a set of training images with ground truth. Initialize parameter vector. � Training algorithm (Collins 02): Loop over training samples: i = 1 to N Step 1: find the best using inference: Step 2: Update the parameters: End of Loop. Inference is critical for learning 11

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning • Supervised Parameter Estimation �RCM-1: Deformable Object 12

RCM-1: Multi-level Potentials � Potentials for appearance [ Gabor, Edge, = …] * 13

RCM-1: Multi-level Potentials � Potentials for appearance [ Gabor, Edge, = …] * 13

RCM-1: Multi-level Potentials � Potentials for shape: triplet descriptors (position, scale, orientation) 14

RCM-1: Multi-level Potentials � Potentials for shape: triplet descriptors (position, scale, orientation) 14

The Inference Results after Supervised Learning 15

The Inference Results after Supervised Learning 15

16

16

Segmentation Results 17

Segmentation Results 17

Evaluations: Segmentation and Parsing � Segmentation (Accuracy of pixel labeling) • The proportion of

Evaluations: Segmentation and Parsing � Segmentation (Accuracy of pixel labeling) • The proportion of the correct pixel labels (object or non- object) � Parsing (Average Position Error of matching) • The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree Methods Testing Segmentation Parsing Speed RCM-1 228 94. 7 16 23 s Ren (Berkeley) 172 91 Winn (LOCUS) 200 93 Levin and Weiss N/A 95 Kumar (OBJ CUT) 5 96 18

Recursive Composition �Modeling: (Representation) • Recursive Compositional Models (RCMs) �Inference: (Computing) • Recursive Optimization

Recursive Composition �Modeling: (Representation) • Recursive Compositional Models (RCMs) �Inference: (Computing) • Recursive Optimization (Polynomial-time) �Learning: • Supervised Parameter Estimation • Unsupervised Recursive Learning �RCM-1: deformable object 20

Unsupervised Learning � Task: given 10 training images, no labeling, no alignment, highly ambiguous

Unsupervised Learning � Task: given 10 training images, no labeling, no alignment, highly ambiguous features. • Induce the structure (nodes and edges) • Estimate the parameters. Correspondence is unknown ? Combinatorial Explosion problem 21

Recursive Dictionary Learning � Multi-level dictionary (layer-wise greedy) � Bottom-Up and Top-Down recursive procedure

Recursive Dictionary Learning � Multi-level dictionary (layer-wise greedy) � Bottom-Up and Top-Down recursive procedure � Three Principles: • Recursive Composition • Suspicious Coincidence • Competitive Exclusion Recursion Barlow 94. 22

10 images for training 23

10 images for training 23

Bottom-up Learning Suspicious Coincidence Composition Clustering Competitive Exclusion 24

Bottom-up Learning Suspicious Coincidence Composition Clustering Competitive Exclusion 24

The Dictionary: From Generic Parts to Object Structures � Unified representation (RCMs) and learning

The Dictionary: From Generic Parts to Object Structures � Unified representation (RCMs) and learning � Bridge the gap between the generic features and specific object structures 25

Dictionary Size, Part Sharing and Computational Complexity Level Composition Clusters Suspicious Coincidence More Sharing

Dictionary Size, Part Sharing and Computational Complexity Level Composition Clusters Suspicious Coincidence More Sharing 0 Competitive Exclusion Seconds 4 1 1 167, 431 14, 684 262 48 117 2 2, 034, 851 741, 662 995 116 254 3 2, 135, 467 1, 012, 777 305 53 99 4 236, 955 72, 620 30 2 9 26

27

27

28

28

29

29

Top-down refinement � Fill in missing parts � Examine every node from top to

Top-down refinement � Fill in missing parts � Examine every node from top to bottom 30

31

31

Evaluations of Unsupervised Learning Methods Testing Segmentation Unsupervised 316 93. 3 Supervised 228 94.

Evaluations of Unsupervised Learning Methods Testing Segmentation Unsupervised 316 93. 3 Supervised 228 94. 7 Parsing Speed 17 s 16 23 s 32

Scale up the System: Issue I �More classes/viewpoints -> more training/detection cost 33

Scale up the System: Issue I �More classes/viewpoints -> more training/detection cost 33

Scale up the System: Issue II �No enough data for rare viewpoints/classes 34

Scale up the System: Issue II �No enough data for rare viewpoints/classes 34

Our Strategy �Joint multi-class multi-view learning �Appearance sharing �Part sharing 35

Our Strategy �Joint multi-class multi-view learning �Appearance sharing �Part sharing 35

Joint Multi-Class Multi-View Learning � 120 templates: 5 viewpoints & 26 classes 36

Joint Multi-Class Multi-View Learning � 120 templates: 5 viewpoints & 26 classes 36

Different Viewpoints Share same appearance 37

Different Viewpoints Share same appearance 37

Different Classes Share Common Parts 38

Different Classes Share Common Parts 38

Compact Hierarchical Dictionary 39

Compact Hierarchical Dictionary 39

Dense Part Sharing at Low Levels: Layer-2 40

Dense Part Sharing at Low Levels: Layer-2 40

Less Part Sharing: Layer-3 41

Less Part Sharing: Layer-3 41

Sparse Part Sharing at High Levels: Layer-4 42

Sparse Part Sharing at High Levels: Layer-4 42

Re-usable Parts: All Layers 43

Re-usable Parts: All Layers 43

The more classes/viewpoints, the more amount of part sharing 44

The more classes/viewpoints, the more amount of part sharing 44

Multi-View Single Class Performance 45

Multi-View Single Class Performance 45

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning • Supervised Parameter Estimation �RCM-1: Deformable Object �RCM-2: Articulated Object 46

RCM-2 for Articulated Object: Horses multiple poses y=(switch, position, scale, orientation) Composition Switch 47

RCM-2 for Articulated Object: Horses multiple poses y=(switch, position, scale, orientation) Composition Switch 47

RCM-2 for Human Body 48

RCM-2 for Human Body 48

49

49

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning

Recursive Composition �Representation • Recursive Compositional Models (RCMs) �Inference • Recursive Optimization (Polynomial-time) �Learning • Supervised Parameter Estimation �RCM-1: Deformable Object �RCM-2: Articulated Object �RCM-3: Scene (Entire Image) 50

Image Scene Parsing �Task: Image Segmentation and Labeling 51

Image Scene Parsing �Task: Image Segmentation and Labeling 51

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Lack of long-range

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Lack of long-range interactions. � Lack of region-level properties. � High-order potentials -> heavy computation Geman and Geman 84. L Zhu et al. NIPS 08 52

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Joint segmentation-recognition template

Scene Modeling: RCM-3 � Flat MRF: object labeling (recognition only). � Joint segmentation-recognition template Geman and Geman 84. L Zhu et al. NIPS 08 53

Segmentation and Recognition Template � (segmentation, object) pair: chicken-and-egg of segmentation and recognition. �

Segmentation and Recognition Template � (segmentation, object) pair: chicken-and-egg of segmentation and recognition. � Multi-level low-dimensional abstraction coarse to fine Global: gist of scene object layout Local: concurrent shape and appearance 54

RCM-3 for Scene Parsing Recursion y=(segmentation, object) f: appearance likelihood g: object layout prior

RCM-3 for Scene Parsing Recursion y=(segmentation, object) f: appearance likelihood g: object layout prior homogeneity layer-wise consistency object texture color object cooccurrence Horse Grass segmentation prior 55

RCM-3: Inference and Learning � State space: • C=21 classes; D=30 templates; • K=3

RCM-3: Inference and Learning � State space: • C=21 classes; D=30 templates; • K=3 classes / per template � Inference (recursive optimization): � Supervised learning (perceptron ) 56

57

57

58

58

Evaluations of RCM-3 �Implementation Details Dataset Classes Size Training Time Testing Time MSRC 21

Evaluations of RCM-3 �Implementation Details Dataset Classes Size Training Time Testing Time MSRC 21 45% 55 h 30 s 591 �Comparisons Texton. Boost Shotton et al. 04 PLSA-MRF Berbeek and Trigg Auto. Context Classifier Tu 08 only RCM-3 Average 57. 7 64 68 67. 2 74. 5 Global 73. 5 77. 7 75. 9 81. 4 72. 2 69 (Classifier) 59

Unified RCMs: Object vs. Scene RCM-1 Triplets of Parts Segments Boundary only RCM-2 RCM-3

Unified RCMs: Object vs. Scene RCM-1 Triplets of Parts Segments Boundary only RCM-2 RCM-3 Triplets of Region + Boundary 60

Conclusions � Principle: Recursive Composition • Composition -> complexity decomposition • Recursion -> Universal

Conclusions � Principle: Recursive Composition • Composition -> complexity decomposition • Recursion -> Universal rules (self-similarity) • Recursion and Composition -> sparseness � One formula for different tasks. � Key: the representation of visual patterns, i. e. y. � Low dimension, simple potentials � Scaling up: practical Image Understanding System 61

References � � � � Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, Alan.

References � � � � Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, Alan. Yuille. Part and Appearance Sharing: Recursive Compositional Models for Multi. View Multi-Object Detection. CVPR. 2010. Long Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2 D Parsing. NIPS 2008. Long Zhu, Chenxi Lin, Haoda Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008. Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille. Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008. Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille, Hongjiang Zhang. Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing. NIPS 2007. Long Zhu, Alan L. Yuille. A Hierarchical Compositional System for Rapid Object Detection. NIPS 2005 62