Image Similarity and the Earth Movers Distance Empirical

  • Slides: 36
Download presentation
Image Similarity and the Earth Mover’s Distance Empirical Evaluation of Dissimilarity Measures for Color

Image Similarity and the Earth Mover’s Distance Empirical Evaluation of Dissimilarity Measures for Color and Texture Y. Rubner, J. Puzicha, C. Tomasi and T. M. Buhmann The Earth Mover’s Distance as a Metric for Image Retrieval Y. Rubner, C. Tomasi and J. J. Guibas The Earth Mover’s Distance is the Mallows Distance: Some Insights from Statistics E. Levina and P. J. Bickel Learning-Based Methods in Vision - Spring 2007 Frederik Heger (with graphics from last year’s slides) 1 February 2007

How Similar Are They? Images from Caltech 256 2 LBMV Spring 2007 - Frederik

How Similar Are They? Images from Caltech 256 2 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Similarity is Important for … • Image classification • • Is there a penguin

Similarity is Important for … • Image classification • • Is there a penguin in this picture? This is a picture of a penguin. • Image retrieval • • Find pictures with a penguin in them. Image as search query • Find more images like this one. • Image segmentation • 3 Something that looked like this was called penguin before. LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Image Representations: Histograms Images from Dave Kauchak Normal histogram • Generalize to arbitrary dimensions

Image Representations: Histograms Images from Dave Kauchak Normal histogram • Generalize to arbitrary dimensions • Represent distribution of features • 4 Cumulative histogram Color, texture, depth, … LBMV Spring 2007 - Frederik Heger Space Shuttle Cargo Bay fwh@cs. cmu. edu

Image Representations: Histograms Images from Dave Kauchak Joint histogram 5 Marginal histogram • Requires

Image Representations: Histograms Images from Dave Kauchak Joint histogram 5 Marginal histogram • Requires lots of data • Requires independent features • Loss of resolution to avoid empty bins • More data/bin than joint histogram LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Image Representations: Histograms Images from Dave Kauchak Adaptive binning • • 6 Better data/bin

Image Representations: Histograms Images from Dave Kauchak Adaptive binning • • 6 Better data/bin distribution, fewer empty bins Space Shuttle Can adapt available resolution to relative feature importance Cargo Bay LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Image Representations: Histograms Images from Dave Kauchak EASE Truss Assembly Clusters / Signatures •

Image Representations: Histograms Images from Dave Kauchak EASE Truss Assembly Clusters / Signatures • “super-adaptive” binning • Does not require discretization along any fixed axis Space Shuttle Cargo Bay 7 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Distance Metrics y y x 8 - x = Euclidian distance of 5 units

Distance Metrics y y x 8 - x = Euclidian distance of 5 units - = Grayvalue distance of 50 values - =? LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Issue: How to Compare Histograms? Bin-by-bin comparison Sensitive to bin size. Could use wider

Issue: How to Compare Histograms? Bin-by-bin comparison Sensitive to bin size. Could use wider bins … … but at a loss of resolution 9 LBMV Spring 2007 - Frederik Heger Cross-bin comparison How much cross-bin influence is necessary/sufficient? fwh@cs. cmu. edu

Overview: Similarity Measures Heuristic Histogram Distance: Minkowski-form distance (Lp) Special Cases: L 1 Mahattan

Overview: Similarity Measures Heuristic Histogram Distance: Minkowski-form distance (Lp) Special Cases: L 1 Mahattan distance L 2 Euclidian Distance L Maximum value distance 10 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Heuristic Histogram Distance: Weighted-Mean-Variance (WMV) Info: • Per-feature similarity measure •

Overview: Similarity Measures Heuristic Histogram Distance: Weighted-Mean-Variance (WMV) Info: • Per-feature similarity measure • Based on Gabor filter image representation • Shown to outperform several parametric models for texture-based image retrieval 11 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Nonparametric Test Statistic: Kolmogorov-Smirnov distance (KS) Info: • Defined for only

Overview: Similarity Measures Nonparametric Test Statistic: Kolmogorov-Smirnov distance (KS) Info: • Defined for only one dimension • Maximum discrepancy between cumulative distributions • Invariant to arbitrary monotonic feature transformations 12 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Nonparametric Test Statistic: Cramer/von Mises type statistic (Cv. M) Info: •

Overview: Similarity Measures Nonparametric Test Statistic: Cramer/von Mises type statistic (Cv. M) Info: • Squared Euclidian distance between distributions • Defined for single dimension 13 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Nonparametric Test Statistic: 2 Info: • Very commonly used 14 LBMV

Overview: Similarity Measures Nonparametric Test Statistic: 2 Info: • Very commonly used 14 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Information-theory Divergence: Kullback-Leibler divergence (KL) Info: • Code one histogram using

Overview: Similarity Measures Information-theory Divergence: Kullback-Leibler divergence (KL) Info: • Code one histogram using the other as true distribution • How inefficient would it be? • Also widely used. 15 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Information-theory Divergence: Jeffrey-divergence (JD) Info: • Similar to KL divergence •

Overview: Similarity Measures Information-theory Divergence: Jeffrey-divergence (JD) Info: • Similar to KL divergence • But symmetric and numerically stable 16 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Ground Distance Measure: Quadratic Form (QF) Info: • Heuristic approach •

Overview: Similarity Measures Ground Distance Measure: Quadratic Form (QF) Info: • Heuristic approach • Matrix A incorporates cross-bin information 17 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Overview: Similarity Measures Ground Distance Measure Earth Mover’s Distance (EMD) Info: • Based on

Overview: Similarity Measures Ground Distance Measure Earth Mover’s Distance (EMD) Info: • Based on solution of linear optimization problem (transportation problem) • Minimal cost to transform one distribution to the other • Total cost = sum of costs for individual features 18 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Summary: Similarity Measures 19 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Summary: Similarity Measures 19 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance ≠ 20 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance ≠ 20 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance ≠ 21 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance ≠ 21 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance = 22 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance = 22 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Earth Mover’s Distance (amount moved) * (distance moved) = 23 LBMV Spring 2007 -

Earth Mover’s Distance (amount moved) * (distance moved) = 23 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

How EMD Works (distance moved) * (amount moved) P All movements m clusters (distance

How EMD Works (distance moved) * (amount moved) P All movements m clusters (distance moved) * (amount moved) Q * (amount moved) n clusters 24 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

How EMD Works Move earth only from P to Q P m clusters P’

How EMD Works Move earth only from P to Q P m clusters P’ Q n clusters 25 Q’ LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

How EMD Works P cannot send more earth than there is P m clusters

How EMD Works P cannot send more earth than there is P m clusters P’ Q n clusters 26 Q’ LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

How EMD Works Q cannot receive more earth than it can hold P m

How EMD Works Q cannot receive more earth than it can hold P m clusters P’ Q n clusters 27 Q’ LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

How EMD Works As much earth as possible must be moved P m clusters

How EMD Works As much earth as possible must be moved P m clusters P’ Q n clusters 28 Q’ LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Color-based Image Retrieval L 1 distance Jeffrey divergence χ2 statistics Quadratic form distance Earth

Color-based Image Retrieval L 1 distance Jeffrey divergence χ2 statistics Quadratic form distance Earth Mover Distance 29 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Red Car Retrievals (Color-based) 30 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Red Car Retrievals (Color-based) 30 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Zebra Retrieval (Texture-based) 31 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Zebra Retrieval (Texture-based) 31 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

EMD with Position Encoding without position with position 32 LBMV Spring 2007 - Frederik

EMD with Position Encoding without position with position 32 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Issues with EMD • High computational complexity • Prohibitive for texture segmentation • Features

Issues with EMD • High computational complexity • Prohibitive for texture segmentation • Features ordering needs to be known • Open eyes / closed eyes example • Distance can be set by very few features. • E. g. with partial match of uneven distribution weight EMD = 0, no matter how many features follow 33 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Help From Statisticians • For even-mass distributions, EMD is equivalent to Mallows distance •

Help From Statisticians • For even-mass distributions, EMD is equivalent to Mallows distance • (for uneven mass distributions, the two distances behave differently) • Trick to compute Mallows distance • • 34 1 -D marginals give better classification results than joint distributions (experimental results) Get marginals from empirical distribution by sorting feature vectors LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

EMD Summary / Conclusions • Ground distance metric for image similarity • Uses signatures

EMD Summary / Conclusions • Ground distance metric for image similarity • Uses signatures for best adaptive binning and to lessen impact of prohibitive complexity • Can deal with partial matches • Good performance for color/texture classification • Statistical grounding 35 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Last Slide Comments? Questions? 36 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu

Last Slide Comments? Questions? 36 LBMV Spring 2007 - Frederik Heger fwh@cs. cmu. edu