Automatic Matching of MultiView Images Ed Bremer University

Automatic Matching of Multi-View Images Ed Bremer University of Rochester Automatic Matching of Multi-View Images

References n n [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004, http: //lear. inrialpes. fr/pubs/2004/MS 04 a [2] Mikolajczyk, K. , Tuytelaars, T. , Schmid, C. , Zisserman, A. , Matas, J. , Schaffalitzky, F. , Kadir, T. , Van Gool, L. , 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http: //lear. inrialpes. fr/pubs/2004/MTSZMSKG 04 n [3] Lowe, D. , 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91 -118. n [4] Matas, J. , Chum, O. , Urban, M. , Pajdla, T. 2002. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, Proc British Machine Vision Conference BMVC 2002, pages 384 – 393. n [5] Zisserman, A. , Schaffalitzky, F. , 2002, Multi-view matching for unordered image sets, or ”How do I organize my holiday snaps? ”, Proceedings of the 7 th European Conference on Computer Vision, Copenhagen, Denmark, pages 414 -431, vol 1. n [6] Baumberg, A. , 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR , pages 774 -781. n [7] Mikolajczyk, K, Schmid, C. , 2001, Indexing based on scale invariant interest points, In Proc. 8 th ICCV, pages 525 -531. Automatic Matching of Multi-View Images 2

Outline n Motivation n Applications n Process Components n Region Detectors n Descriptors n Matching Criteria n Performance Evaluation n Conclusion & Next Steps Automatic Matching of Multi-View Images 3

Motivation n Multi-view/Multi-image Matching Multiple images of scene taken by single or multiple cameras with different rotation, scale, viewpoint and illumination 3 D scene Automatic Matching of Multi-View Images 4

Motivation n Applications … detecting matching regions is used in all the following n Image registration n Super-resolution n Stereo vision n Object detection and recognition n Object and motion tracking n Indexing and retrieval of objects n 3 D scene reconstruction n Scene recognition Automatic Matching of Multi-View Images 5

Examples of Multi-view Images [2] Mikolajczyk, K. , Tuytelaars, T. , Schmid, C. , Zisserman, A. , Matas, J. , Schaffalitzky, F. , Kadir, T. , Van Gool, L. , 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http: //lear. inrialpes. fr/pubs/2004/MTSZMSKG 04 Automatic Matching of Multi-View Images 6

Process Components n Covariant region detection n n Invariant descriptor n n Detect image regions covariant to class of transformation between reference image and transformed image Compute invariant descriptors from covariant regions Descriptor matching n Compute distance between descriptors in reference image and transformed image [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 7

Region Detectors n Support regions for computation of descriptors n n n Determined independently in each image Scale invariant or Affine invariant Can be points (feature points) or regions (covariant) Provide dense (local) coverage – robust to occlusion Need to be stable and repeatable Five region detectors - n n n Harris points -> invariant to rotation Harris-Laplacian -> invariant to rotation and scale Hessian-Laplace ->invariant to rotation and scale Harris-Affine -> invariant to affine image transformations Hessian-Affine -> invariant to affine image transformations [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 8

Region Detectors n Harris points - n n n Maxima of Harris function used to locate interest point Support region fixed in size, 41 x 41 neighborhood centered at interest point Harris-Laplace regions - n n Scale adapted Harris function Interest point is local minima or maxima across scale-space by Laplacian-of. Gaussian [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 9

Region Detectors n Harris-Laplace Performance - n n Approximately 10% better than Laplacian, Lowe or gradient methods. Harris standard detector is very poor under scale changes [7] Mikolajczyk, K. , Schmid, C. , 2001, Indexing based on scale invariant interest points, In Proc. 8 th ICCV, Pages 525 -531. Automatic Matching of Multi-View Images 10

Region Detectors n Hessian-Laplace regions n Interest point is at local maxima of Hessian determinant n Location in scale-space using maxima of Laplacian-of-Gaussian (can also use Difference-of-Gaussians) [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a [3] Lowe, D. , 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91118. Automatic Matching of Multi-View Images 11

Region Detectors n Harris-Affine regions - n n n Find regions using Harris-Laplace detector Region based on 2 nd moment & affine adapted Hessian-Affine regions n n Find regions using Hessian-Laplace detector Affine adapted region based on 2 nd moment. [2] Mikolajczyk, K. , Tuytelaars, T. , Schmid, C. , Zisserman, A. , Matas, J. , Schaffalitzky, F. , Kadir, T. , Van Gool, L. , 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http: //lear. inrialpes. fr/pubs/2004/MTSZMSKG 04 Automatic Matching of Multi-View Images 12

Region Detectors n Regions produced by Harris-Affine and Hessian-Affine detectors [2] Mikolajczyk, K. , Tuytelaars, T. , Schmid, C. , Zisserman, A. , Matas, J. , Schaffalitzky, F. , Kadir, T. , Van Gool, L. , 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http: //lear. inrialpes. fr/pubs/2004/MTSZMSKG 04 Automatic Matching of Multi-View Images 13

Region Detectors n Affine normalization using 2 nd moment matrix for region L and R [2] Mikolajczyk, K. , Tuytelaars, T. , Schmid, C. , Zisserman, A. , Matas, J. , Schaffalitzky, F. , Kadir, T. , Van Gool, L. , 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http: //lear. inrialpes. fr/pubs/2004/MTSZMSKG 04 Automatic Matching of Multi-View Images 14

Region Detectors n Region normalization n n Detectors produce circular or elliptical regions Size dependant on detection scale Map regions to circular region with constant radius Rotate regions in direction of dominant gradient orientation Illumination normalization n n Use affine transformation -> a. I(x) + b Mean and standard deviation of pixel intensities [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 15

Descriptors n Descriptors -> Feature vector n n Invariant to changes in scale, rotation, affine translation and affine illumination Need to be distinct, stable and repeatable Distribution (histogram) type or Covariance type Ten Descriptor types n n n n n Scale-Invariant Feature Transform (SIFT) Gradient Location and Orientation histogram (GLOH) Shape Context Principal Component Analysis (PCA)-SIFT Steerable Filters Differential Invariants Complex Filters Moment Invariants Cross-Correlation Spin Image [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 16

Descriptors n SIFT and GLOH 3 D Descriptors n n SIFT -> 4 x 8 = 128 dimension descriptor GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 17

Matching Criteria n Distance measure n n n Simple threshold n n n Find putative matches between images Mahalanobis distance – used for covariant descriptors Euclidean distance – used for distribution (histogram) descriptors Direct distance comparison not suitable for indexing or database searching Descriptors match if distance between is below threshold t Descriptor in reference image can have many matches to descriptors in transformed image Nearest Neighbor (NN) n n Find closest match between descriptors in reference and transformed image Descriptor in reference image can have only 1 match to descriptor in transformed image Automatic Matching of Multi-View Images 18

Performance Evaluation n Criterion basis n n n Recall rate = #correct matched/#correspondences 1 -precision = #false matches/[#correct matches + #false matches] Ideal descriptor -> recall rate = 1, for all precision given no overlap error [1] Mikolajczyk, K. , Schmid, C. , 2004, A performance evaluation of local descriptors, Submitted to PAMI, http: //lear. inrialpes. fr/pubs/2004/MS 04 a Automatic Matching of Multi-View Images 19

SIFT - Scale Invariant Feature Transform n Scale Invariant Feature Transform (SIFT) Lowe [3] n Features – n n n Invariant to image scale, rotation Invariant for small changes in illumination and 3 D camera viewpoint Extracts large number of highly distinctive features n n Enables detection of small objects Improved performance in cluttered scenes n Algorithms are efficient – complex operations applied to local regions or features vs whole image n Procedure n n Scale-space extrema detection Keypoint localization Orientation asignment Keypoint vector (descriptor) Automatic Matching of Multi-View Images 20

SIFT - Scale Invariant Feature Transform [3] n Scale-Space Blob Detector n Search for stable features over all scales and image locations Scale-space kernel -> Gaussian function n Difference of Gaussian n Automatic Matching of Multi-View Images 21

SIFT - Scale Invariant Feature Transform [3] n Difference of Gaussian (Do. G) n simple subtraction of blurred L images n Approximation to scale-normalized Laplacian of Gaussian Maxima or minima of scale-normalized Laplacian produces the most stable image features compared to gradient, Hessian, or Harris corner function (Mikolajczyk 2002) Automatic Matching of Multi-View Images 22

SIFT - Scale Invariant Feature Transform [3] n Scale-Space Image Set - n Divide each octave into s intervals n Compute s + 3 filtered (increasing blurry) images, k = 2 (1/s) s = 3, k = 1. 26 -> 6 th –> 3. 18σ 5 th –> 2. 52σ 4 th –> 2. 00σ 3 rd –> 1. 59σ 2 nd –> 1. 26σ 1 st –> 1. 00σ n Subtract adjacent images to produce Do. G images n Repeat for next octave using 2 nd image from top and decimate by 2 Automatic Matching of Multi-View Images 23

SIFT - Scale Invariant Feature Transform [3] n Scale-Space Pyramid (from Lowe) Automatic Matching of Multi-View Images 24

SIFT - Scale Invariant Feature Transform [3] n Locating Scale-Space Extrema n n Detection of local maxima or minima of D(x, y, σ) Compare each sample point to 8 neighbors in same scale image and 9 neighbors in scale image above and below. Mark if sample is greater than or less than all of the neighbors Compares s number of Do. G images Automatic Matching of Multi-View Images 25

SIFT - Scale Invariant Feature Transform [3] n Improving Localization - n Reject points that have low contrast using: <threshold n Where –> n Gives offset extremum -> n Hessian and derivative of D(x, y, σ) uses differences of neighboring sample points. x = (x, y , σ)T is offset from sample point Automatic Matching of Multi-View Images 26

SIFT - Scale Invariant Feature Transform [3] n Edge Rejection - n Eliminate poorly defined peaks (edges) using Hessian matrix n Verify ratio of principal curves is less than threshold r<10 n Efficient to compute -> less than 20 floating point operations Automatic Matching of Multi-View Images 27

SIFT - Scale Invariant Feature Transform [3] n Results from Lowe [3] – 832 keypoints reduced to 536 (233 x 189 image) Automatic Matching of Multi-View Images 28

SIFT - Scale Invariant Feature Transform n Results from Lowe [3] – performance measures Automatic Matching of Multi-View Images 29

SIFT - Scale Invariant Feature Transform n Results from Lowe [3] – performance measures Automatic Matching of Multi-View Images 30

SIFT - Scale Invariant Feature Transform [3] n Orientation – rotational invariance n Use scale of point to select image L(x, y, σ) n Compute the gradient m(x, y) and orientation θ(x, y) at each image sample using differences. n n Orientation histogram of sample points – entries weighted by gradient magnitude and a Gaussian window around the keypoint, bins cover 360° range Peaks in histogram correspond to dominant directions of local gradients Automatic Matching of Multi-View Images 31

SIFT - Scale Invariant Feature Transform [3] n Descriptor – the feature vector n n n 8 x 8 sub-region histograms allow shift in gradient positions 128 element feature vector -> 4 x 4 array of 8 orientations (2 x 2 x 8 from Lowe is shown below) Feature vectors matched by nearest neighbor (Euclidean distance) Automatic Matching of Multi-View Images 32

SIFT - Scale Invariant Feature Transform [3] n Results from Lowe [3] – n n n Two training objects recognized in cluttered image Small squares show point matches Large rectangles shown border of training image after affine transformation Automatic Matching of Multi-View Images 33

Conclusions n n Conclusions n Harris-Laplacian region detector performs better than Laplacian, Do. G and gradient scale-space operators n Scale-space detectors provide invariance to rotation, scale and small changes to illumination and viewpoint. n Affine adaptation provides invariance to affine transformations n GLOH and SIFT descriptors provide the best performance. n Dense, localized descriptors perform well under occlusions Nexts steps n Coding and testing of region detectors, descriptors and matching… Automatic Matching of Multi-View Images 34