JIGSAWS JOINT APPEARANCE AND SHAPE CLUSTERING John Winn

Patch models Used for: Object recognition/detection Object segmentation But also: Stereo matching, photo stitching

Patch models Patch clustering/codebook (e. g. Leibe & Schiele) Epitome (Jojic et al. )

Issues with fixed patch size/shape Patch includes background patches containing the same object are

Patch size? More sharing More discriminative Less sharing Size Small Large (single pixel) (entire

Aims of jigsaw model Learn patches (jigsaw pieces) which are 1. Shared: each piece

The Jigsaw model Jigsaw J . . . Image I 1 Offset map L

Toy example Training image Jigsaw Learned using EM + graph cuts

Dog example Training image 32 32 Jigsaw mean

Dog example Reconstructed image 32 32 Jigsaw mean Learned segmentation Epitome reconstruction

Faces example 128 Jigsaw mean 100 64 64 images Source: Olivetti face database

Learning the ‘pieces’ Jigsaw J . . . Image I 1 Offset map L

Faces example Results of shape clustering on the face images

Object recognition (preliminary) Trained set: 20 street images 64 x 64 jigsaw Allow patches

Object recognition (preliminary) Trained set: 20 street images (10 labelled) 64 x 64 jigsaw

Work in progress… Training larger jigsaws on 100 s of images Incorporating shape clustering

Conclusions Jigsaw model allows learning the shape and appearance of objects or object parts

Thank you jwinn@microsoft. com http: //johnwinn. org

Slides: 21

Download presentation

JIGSAWS: JOINT APPEARANCE AND SHAPE CLUSTERING John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge

Patch models Used for: Object recognition/detection Object segmentation But also: Stereo matching, photo stitching Texture synthesis Super-resolution Motion segmentation Image/video compression

Patch models Patch clustering/codebook (e. g. Leibe & Schiele) Epitome (Jojic et al. ) parameter sharing + translation invariant

Issues with fixed patch size/shape Patch includes background patches containing the same object are not clustered together Patch excludes part of object patch is less discriminative Patch includes occlusion occluded and unoccluded objects are not clustered together

Patch size? More sharing More discriminative Less sharing Size Small Large (single pixel) (entire image) Optimal size/shape? Depends on: • object size/shape • object variability • size of training set

Aims of jigsaw model Learn patches (jigsaw pieces) which are 1. Shared: each piece is similar in shape and appearance to many regions of the training images; 2. Discriminative: each piece is as large as possible; 3. Exhaustive: all parts of the training images can be reconstructed from the set of jigsaw pieces.

The Jigsaw model Jigsaw J . . . Image I 1 Offset map L 1 Image I 2 Offset map L 2 Image IN Offset map LN

The Jigsaw model Jigsaw J . . . Image I 1 Offset map L 1 Potts model: Image I 2 Offset map L 2 Image IN Offset map LN

Toy example Training image Jigsaw Learned using EM + graph cuts

Dog example Training image 32 32 Jigsaw mean

Dog example Reconstructed image 32 32 Jigsaw mean Learned segmentation Epitome reconstruction

Faces example 128 Jigsaw mean 100 64 64 images Source: Olivetti face database

Learning the ‘pieces’ Jigsaw J . . . Image I 1 Offset map L 1 Image I 2 Offset map L 2 Image IN Offset map LN

Faces example Results of shape clustering on the face images

Object recognition (preliminary) Trained set: 20 street images 64 x 64 jigsaw Allow patches to deform (as in Layout. CRF, CVPR 2006).

Object recognition (preliminary) Trained set: 20 street images (10 labelled) 64 x 64 jigsaw Allow patches to deform (as in Layout. CRF, CVPR 2006). Accuracy improves (~1%) if you include an additional 10 unlabelled images when learning the jigsaw.

Work in progress… Training larger jigsaws on 100 s of images Incorporating shape clustering into the probabilistic model Learning additional invariances e. g. to illumination Object recognition results on MSRC and other datasets

Conclusions Jigsaw model allows learning the shape and appearance of objects or object parts in images. Can also handle occlusion. Clustering shape and appearance much more powerful for recognition than appearance alone. Can be used as a ‘plug-and-play’ replacement for fixed size patches in any existing patch-based system.

Thank you jwinn@microsoft. com http: //johnwinn. org