Paper Overviews 3 types of descriptors SIFT PCASIFT

Paper Overviews PCA-SIFT: SIFT-based but with a smaller descriptor modifies the SIFT descriptor for

SIFT • “Scale Invariant Feature Transform” • 4 stages: 1. Peak selection 2. Keypoint

SIFT • 1. Peak Selection • Make Gaussian pyramid http: //www. cra. org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters. html

SIFT • 1. Peak Selection • Find local peaks using difference of Gaussians –-

SIFT • 2. Keypoint Localization –Remove peaks that are “unstable”: » Peaks in low-contrast

SIFT • 3. Keypoint Orientation • Make histogram of gradients for a patch of

SIFT • 4. Descriptors • Ideal descriptor: • Compact • Distinctive from other descriptors

SIFT • 4. Descriptors • A SIFT descriptor is a 128 -element vector: –

PCA-SIFT • Changes step 4 of the SIFT process to create different descriptors •

PCA-SIFT • “Principal Component Analysis” (PCA) • A widely-used method of dimensionality reduction •

PCA-SIFT –Creating a descriptor for keypoints: 1. Create patch eigenspace 2. Create projection matrix

PCA-SIFT – 1. Create patch eigenspace –For each keypoint: • Take a 41 x

PCA-SIFT – 1. Create patch eigenspace –M = matrix of gradients for all keypoints

PCA-SIFT – 2. Create projection matrix –Choose first n eigenvectors –This paper uses n

PCA-SIFT – 3. Create feature vector –For a single keypoint: • Take its gradient

PCA-SIFT –Results –Tested SIFT vs. “Grad PCA” and “Img PCA” on a series of

PCA-SIFT –Results (Precision-recall curves) –Grad PCA (black) generally outperforms Img PCA (pink) and SIFT

PCA-SIFT –Results –PCA-SIFT also gets more matches correct on images taken at different viewpoints

A Performance Evaluation of Local Descriptors Krystian Mikojaczyk and Cordilia Schmid

Problem Setting for Comparison Matching Problem From a slide of David G. Lowe (IJCV

Overview of Compared Methods Region Detector detects interest points Region Descriptor describes the points

Region Detector Harris Points Blob Structure Detector 1. Harris-Laplace Regions (similar to Do. G)

Region Descriptors Descriptor Dimension Category Distance Measure SIFT 128 PCA-SIFT GLOH 36 128 SIFT

Matching Strategy Threshold-Based Matching Nearest Neighbor Matching – Threshold DB: the first neighbor Nearest

Peformance Measurements Repeatability rate, ROC Recall-Precision Recall = Precision = TP (True Positive) Actual

Example of Recall-Precision Let's say that our method detected. . * 50 corrsponding pairs

Data. Set 6 different transformed images Rotation Image Blur JPEG Compression Zoom + Rotation

Matching Strategies * Hessian-Affine Regions Nearnest Neigbor Matching – Threshold based Matching Nearnest Neigbor

View Point Change With Hessian Affine Regions With Harris-Affine Regions

Scale Change with Rotation Hessian-Laplace Regions Harris-Laplace Regions

Image Rotation of 30~45 degree Harris Points

JPEG Compression * Hessian-Affine Regions

Illumination Changes * Hessian-Affine Regions

Ranking of Descriptor High Peformance 1. SIFT-based descriptors, 128 dimensions GLOH, SIFT 2. Shape

Ranking of Difficult Image Transformation easy 1. Scale & Rotation & illumination easy difficult

Other Results Hessian Regions are better than Harris Regions Nearnest Neigbor based matching is

A Fast Local Descriptor for Dense Matching Engin Tola, Vincent Lepetit, Pascal Fua Ecole

Paper novelty • Introduces DAISY local image descriptor – much faster to compute than

SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two

SIFT local image descriptor • Each bin contains a weighted sum of the norms

DAISY local image descriptor • Gaussian convolved orientation maps are calculated for every direction

DAISY local image descriptor I. Histograms at every pixel location are computed : histogram

From Descriptor to Depth Map • The model uses EM to estimate depth map

Picking the Best Daisy Simon Winder, Gang Hua, Matthew Brown

Paper Contribution • Utilize novel ground-truth training set • Test multiple configurations of low-level

Descriptor Pipeline • T-block takes the pixels from the image patch and transforms them

Descriptor Pipeline • S-block spatially accumulates weighted filter vectors to give N linearly summed

Descriptor Pipeline • N-block normalizes the complete descriptor to provide invariance to lighting changes.

Descriptor Pipeline • Dimension reduction. Apply principle components analysis to compress descriptor. – First

Descriptor Pipeline • Quantization further compress descriptor to reduce memory requirement for large database

Training • Use 3 D reconstructions as a source of training data. • Use

Slides: 65

Download presentation

Paper Overviews 3 types of descriptors: SIFT / PCA-SIFT (Ke, Sukthankar) GLOH (Mikolajczyk, Schmid) DAISY (Tola, et al, Winder, et al) Comparison of descriptors (Mikolajczyk, Schmid)

Paper Overviews PCA-SIFT: SIFT-based but with a smaller descriptor modifies the SIFT descriptor for robustness and distinctiveness GLOH: DAISY: novel descriptor that uses graph cuts for matching and depth map estimation

SIFT • “Scale Invariant Feature Transform” • 4 stages: 1. Peak selection 2. Keypoint localization 3. Keypoint orientation 4. Descriptors

SIFT • 1. Peak Selection • Make Gaussian pyramid http: //www. cra. org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters. html

SIFT • 1. Peak Selection • Find local peaks using difference of Gaussians –- Peaks are found at different scales http: //www. cra. org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters. html

SIFT • “Scale Invariant Feature Transform” • 4 stages: 1. Peak selection 2. Keypoint localization 3. Keypoint orientation 4. Descriptors

SIFT • 2. Keypoint Localization –Remove peaks that are “unstable”: » Peaks in low-contrast areas » Peaks along edges » Features not distinguishable

SIFT • “Scale Invariant Feature Transform” • 4 stages: 1. Peak selection 2. Keypoint localization 3. Keypoint orientation 4. Descriptors

SIFT • 3. Keypoint Orientation • Make histogram of gradients for a patch of pixels • Orient all patches so the dominant gradient direction is vertical http: //www. inf. fu-berlin. de/lehre/SS 09/CV/uebungen/uebung 09/SIFT. pdf

SIFT • “Scale Invariant Feature Transform” • 4 stages: 1. Peak selection 2. Keypoint localization 3. Keypoint orientation 4. Descriptors

SIFT • 4. Descriptors • Ideal descriptor: • Compact • Distinctive from other descriptors • Robust against lighting / viewpoint changes

SIFT • 4. Descriptors • A SIFT descriptor is a 128 -element vector: – 4 x 4 array of 8 -bin histograms –Each histogram is a smoothed representation of gradient orientations of the patch

PCA-SIFT • Changes step 4 of the SIFT process to create different descriptors • Rationale: – Construction of SIFT descriptors is complicated – Reason for constructing them that way is unclear – Is there a simpler alternative?

PCA-SIFT • “Principal Component Analysis” (PCA) • A widely-used method of dimensionality reduction • Used with SIFT to make a smaller feature descriptor –By projecting the gradient patch into a smaller space

PCA-SIFT –Creating a descriptor for keypoints: 1. Create patch eigenspace 2. Create projection matrix 3. Create feature vector

PCA-SIFT – 1. Create patch eigenspace –For each keypoint: • Take a 41 x 41 patch around the keypoint • Compute horizontal / vertical gradients –Put all gradient vectors for all keypoints into a matrix

PCA-SIFT – 1. Create patch eigenspace –M = matrix of gradients for all keypoints –Calculate covariance of M –Calculate eigenvectors of covariance(M)

PCA-SIFT – 2. Create projection matrix –Choose first n eigenvectors –This paper uses n = 20 –This is the projection matrix –Store for later use, no need to re-compute

PCA-SIFT – 3. Create feature vector –For a single keypoint: • Take its gradient vector, project it with the projection matrix • Feature vector is of size n –This is called Grad PCA in the paper –“Img PCA” - use image patch instead of gradient –Size difference: 128 elements (SIFT) vs. n = 20

PCA-SIFT –Results –Tested SIFT vs. “Grad PCA” and “Img PCA” on a series of image variations: –Gaussian noise – 45° rotation followed by 50% scaling – 50% intensity scaling –Projective warp

PCA-SIFT –Results (Precision-recall curves) –Grad PCA (black) generally outperforms Img PCA (pink) and SIFT (purple) except when brightness is reduced –Both PCA methods outperform SIFT with illumination changes

PCA-SIFT –Results –PCA-SIFT also gets more matches correct on images taken at different viewpoints –

A Performance Evaluation of Local Descriptors Krystian Mikojaczyk and Cordilia Schmid

Problem Setting for Comparison Matching Problem From a slide of David G. Lowe (IJCV 2004) As we did in Project 2: Panorama, we want to find correct pairs of points in two images.

Overview of Compared Methods Region Detector detects interest points Region Descriptor describes the points Matching Strategy How to find pairs of points in two images?

Region Detector Harris Points Blob Structure Detector 1. Harris-Laplace Regions (similar to Do. G) 2. Hessian-Laplace Regions 3. Harris-Affine Region 4. Hessian-Affine Region Edge Detector Canny Detector

Region Descriptors Descriptor Dimension Category Distance Measure SIFT 128 PCA-SIFT GLOH 36 128 SIFT Based Descriptors Shape Context 36 Similar to SIFT, but focues on Edge locations with Canny Detector Spin 50 A sparse set of affine-invariant local patches are used Steerable Filter 14 Euclidean Differential Invariants Complex Filters Gradient Moments Cross Correlation 14 1681 20 81 Differential Descriptors Forcuses on the properties of local derivaties (local jet) Consists of many fileters Moment based descriptor Uniformaly sampled locations Mahalanobis

Matching Strategy Threshold-Based Matching Nearest Neighbor Matching – Threshold DB: the first neighbor Nearest Neighbor Matching – Distance Ratio DB: the first neighbor DC: the second neighbor

Peformance Measurements Repeatability rate, ROC Recall-Precision Recall = Precision = TP (True Positive) Actual positive TP (True Positive) Predicted positive = = # of correct maches Total # of correct matches + # of false matches

Example of Recall-Precision Let's say that our method detected. . * 50 corrsponding pairs were extracted * 40 detected pairs were correct pairs * As a groud truth, there are 200 correct pairs! Then, Recall = C/B = 40/200 = 20% C A B Precision = C/A = 40/50 = 80% Predicted Pos The perfect descriptor gives 100% recall for any value of Precision!! Actual pos

Data. Set 6 different transformed images Rotation Image Blur JPEG Compression Zoom + Rotation Viewpoint Change Light Change

Matching Strategies * Hessian-Affine Regions Nearnest Neigbor Matching – Threshold based Matching Nearnest Neigbor Matching – Distance Ratio

View Point Change With Hessian Affine Regions With Harris-Affine Regions

Scale Change with Rotation Hessian-Laplace Regions Harris-Laplace Regions

Image Rotation of 30~45 degree Harris Points

Image Blur Hessian Affine Regions

JPEG Compression * Hessian-Affine Regions

Illumination Changes * Hessian-Affine Regions

Ranking of Descriptor High Peformance 1. SIFT-based descriptors, 128 dimensions GLOH, SIFT 2. Shape Context, 36 dimensions 3. PCA-SIFT, 36 dimensions 4. Gradient moments & Steerable Filters ( 20 dimensions ) & ( 14 dimensions) Low Peformance 5. Other descriptors Note: This performance is for matching problem. This is not general performance.

Ranking of Difficult Image Transformation easy 1. Scale & Rotation & illumination easy difficult 2. JPEG Compression 3. Image Blur difficult 4. View Point Change 1. Structured Scene 2. Textured Scene Two Textured Scenes

Other Results Hessian Regions are better than Harris Regions Nearnest Neigbor based matching is better than a simple threshold based matching SIFT becomes better when nearenest neigbor distance ration is used Robust region descriptors peform bettern than point-wise descriptors Image Rotation does not have big impact on the accuracy of descriptors

A Fast Local Descriptor for Dense Matching Engin Tola, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federale de Lausanne, Switzerland

Paper novelty • Introduces DAISY local image descriptor – much faster to compute than SIFT for dense point matching – works on the par or better than SIFT • DAISY descriptors are fed into expectation-maximization (EM) algorithm which uses graph cuts to estimate the scene’s depth. – works on low-quality images such as the ones captured by video streams

SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two dimensions correspond to image spatial dimensions and the additional dimension to the image gradient direction (normally discretized into 8 bins)

SIFT local image descriptor • Each bin contains a weighted sum of the norms of the image gradients around its center, where the weights roughly depend on the distance to the bin center

DAISY local image descriptor • Gaussian convolved orientation maps are calculated for every direction (. )+ : Gaussian convolution filter with variance S : image gradient in direction o : operator (a)+ = max(a, 0) : orientation maps • Every location in contains a value very similar to what a bin in SIFT contains: a weighted sum computed over an area of gradient norms

DAISY local image descriptor

DAISY local image descriptor I. Histograms at every pixel location are computed : histogram at location (u, v) : Gaussian convolved orientation maps II. Histograms are normalized to unit norm III. Local image descriptor is computed as : the location with distance R from (u, v) in the direction given by j when the directions are quantized into N values

From Descriptor to Depth Map • The model uses EM to estimate depth map Z and occlusion map O by maximizing : descriptor of image n

Results

Picking the Best Daisy Simon Winder, Gang Hua, Matthew Brown

Paper Contribution • Utilize novel ground-truth training set • Test multiple configurations of low-level filters and DAISY pooling and optimize over their parameter • Investigate the effects of robust normalization • Apply PCA dimension reduction and dynamic range reduction to compress the representation of descriptors • Discuss computational efficiency and provide a list of recommendations for descriptors that are useful in different scenarios

Descriptor Pipeline • T-block takes the pixels from the image patch and transforms them to produce a vector of k non-linear filter responses at each pixel. – Block T 1 involves computing gradients at each pixel and bilinearly quantizing the gradient angle into k orientation bins as in SIFT – Block T 2 rectifies the x and y components of the gradient to produce a vector of length 4: – Block T 3 uses steerable filters evaluated at a number of different orientations

Descriptor Pipeline • S-block spatially accumulates weighted filter vectors to give N linearly summed vectors of length k and these are concatenated to form a descriptor of k. N dimensions.

Descriptor Pipeline • N-block normalizes the complete descriptor to provide invariance to lighting changes. Use a form of threshold normalization with the following stages – Normalize the descriptor to a unit vector – Clip all the elements of the vector that are above a threshold – Scale the vector to a byte range. by computing

Descriptor Pipeline • Dimension reduction. Apply principle components analysis to compress descriptor. – First optimize the parameters of the descriptor and then compute the matrix of principal components base on all descriptors computed on the training set. – Next find the best dimensionality for reduction by computing the error rate on random subsets of the training data. – Progressively increasing the dimensionality by adding PCA bases until minimum error is found.

Descriptor Pipeline • Quantization further compress descriptor to reduce memory requirement for large database of descriptor by quantizing descriptor elements into L levels.

Training • Use 3 D reconstructions as a source of training data. • Use machine learning approach to optimize parameters.

Results • Gradient-based descriptor

Results • Dimension Reduction

Results • Descriptor Quantization