Video Face Recognition A Literature Review Hao Zhang

  • Slides: 25
Download presentation
Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

Video Face Recognition: A Literature Review Hao Zhang Computer Science Department 1

Problem Statement Identification Verification A A B B Same / Different persons? C D

Problem Statement Identification Verification A A B B Same / Different persons? C D Which has the same identity as A? 2

Solutions • • Extensions of still face recognition algorithms 3 D model reconstruction Employing

Solutions • • Extensions of still face recognition algorithms 3 D model reconstruction Employing temporal information Set-to-set matching methods 3

Extensions of still face recognition algorithms • Joint sparse representation probe gallery Data: k-th

Extensions of still face recognition algorithms • Joint sparse representation probe gallery Data: k-th partition of a query video Dictionary: a concatenation of all dictionaries of k-th partition of training videos 4

Extensions of still face recognition algorithms • Joint sparse representation : Conclusion – –

Extensions of still face recognition algorithms • Joint sparse representation : Conclusion – – Joint sparse representation Only suitable for face identification Cannot handle new faces Violates the protocol of face verification 5

Extensions of still face recognition algorithms • Multiple metric learning (MML) Video Volumes Patches

Extensions of still face recognition algorithms • Multiple metric learning (MML) Video Volumes Patches Feature Extraction MML * A part of this figure is from [5] 6

Extensions of still face recognition algorithms • Multiple metric learning (MML): A conclusion –

Extensions of still face recognition algorithms • Multiple metric learning (MML): A conclusion – It can be easily adapted to solve both still and video problems. – It discards additional information in the video. 7

3 D model reconstruction • From a single frontal image: Analysis reconstructed 3 D

3 D model reconstruction • From a single frontal image: Analysis reconstructed 3 D shape Mean training 3 D shape PCA projection matrix of training 3 D shapes 2 D mappings of input 2 D shape scale and translation term * The two above images are from [8] 8

3 D model reconstruction • Reconstruction from a single image: Synthesis Pose Illumination Expression

3 D model reconstruction • Reconstruction from a single image: Synthesis Pose Illumination Expression * This figure is from [8] 9

3 D model reconstruction • Reconstruction from a single image: Conclusion – Handle pose

3 D model reconstruction • Reconstruction from a single image: Conclusion – Handle pose and illumination variations – 2 D images of good quality – Synthesis of lighting and expression is far from perfect 10

Employing temporal information • Dynamic system model, ARMA : state vector encoding pose at

Employing temporal information • Dynamic system model, ARMA : state vector encoding pose at time t : face appearance at time t Video similarity is computed using an observability matrix formed by A and C. 11

Employing temporal information • Dynamic system model: Conclusion – Incorporate time information for recognition

Employing temporal information • Dynamic system model: Conclusion – Incorporate time information for recognition – Linear assumption – Manifold learning methods can be applied using the observability matrix 12

Employing temporal information • Probabilistic model Can be adapted to handle occlusion : Image

Employing temporal information • Probabilistic model Can be adapted to handle occlusion : Image I’s distance to the manifold of k-th video : probability of image I’s projection in * The figure is from [9] 13

Employing temporal information • Probabilistic model: Conclusion – Incorporate time information to make decisions

Employing temporal information • Probabilistic model: Conclusion – Incorporate time information to make decisions more robustly – Error can propagate – Majority voting 14

Set-to-set matching • Manifold-manifold distance Clustering criteria: distance Manifold A Manifold B 15

Set-to-set matching • Manifold-manifold distance Clustering criteria: distance Manifold A Manifold B 15

Set-to-set matching • Manifold-manifold distance: Conclusion – Overcomes the drawbacks of voting methods –

Set-to-set matching • Manifold-manifold distance: Conclusion – Overcomes the drawbacks of voting methods – Clustering results will be different due to random initialization 16

Set-to-set matching • Affine Hull Representation ll u h e Affin ull x h

Set-to-set matching • Affine Hull Representation ll u h e Affin ull x h e v n Co Reduced affine hull: 17

Set-to-set matching • Affine Hull Representation: Conclusion – “Size changeable” affine hulls – Unclear

Set-to-set matching • Affine Hull Representation: Conclusion – “Size changeable” affine hulls – Unclear which representation is better Which to use: convex hull, affine hull or linear span? 18

Set-to-set matching • Statistical methods on Grassmann manifolds Distribution is defined on the tangent

Set-to-set matching • Statistical methods on Grassmann manifolds Distribution is defined on the tangent plane of Karcher mean Local mapping using exponential map preserves geodesic distance 19

Set-to-set matching • Statistical methods on Grassmann manifolds: Conclusion – Distribution models on manifold

Set-to-set matching • Statistical methods on Grassmann manifolds: Conclusion – Distribution models on manifold – A video is simply represented as a linear space – Too few samples • Thoughts: – Partition the video to obtain multiple points on Grassmann manifold 20

A summary for each category Approach Summary Still Largely inherit properties of still algorithms

A summary for each category Approach Summary Still Largely inherit properties of still algorithms extensions 3 D model Handle pose and illumination variations 2 D image of good quality Synthesis is not good Temporal Encode face dynamics Error may propagate Set-to-set Solid mathematical background Generally less computational burden 21

Important Datasets 2001 2003 2009 2011 2013 22

Important Datasets 2001 2003 2009 2011 2013 22

Comparing Results? Still extensions Data set Temporal Set-to-set SR MML MBGS ARMA Prob Affine

Comparing Results? Still extensions Data set Temporal Set-to-set SR MML MBGS ARMA Prob Affine M 2 M Mo. Bo x x x 0. 98 (1, 3) 0. 94 x (rand) Honda 0. 97 (#fra mes) x x 0. 92 (15, 30) ? 0. 92 0. 97 x (20, 39, (rand) noise) MBGC 0. 88 x (s 234) x x x 0. 71 (s 234) YTF x 0. 76 (cr) x x x Alg 0. 79 (cr) Stat 23

Summary • Current trends: – Extensions of still face recognition algorithms – Set-to-set matching

Summary • Current trends: – Extensions of still face recognition algorithms – Set-to-set matching methods • Common issues: – Computational burden – Pose variations • Thoughts: good training data and transfer learning – Need common protocols and datasets • Much better recently 24

References • • • • [1] G. Aggarwal, A. K. R. Chowdhury, and R.

References • • • • [1] G. Aggarwal, A. K. R. Chowdhury, and R. Chellappa. A system identification approach for video-based face recognition. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17 th International Conference on, volume 4, pages 175– 178. IEEE, 2004. [2] J. R. Beveridge, P. J. Phillips, D. Bolme, B. A. Draper, G. H. Givens, Y. M. Lui, M. N. Teli, H. Zhang, W. T. Scruggs, K. W. Bowyer, et al. The challenge of face recognition from digital point-and-shoot cameras. IEEE Conference on Biometrics: Theory, Applications and Systems, 2013. [3] H. Cevikalp and B. Triggs. Face recognition based on image sets. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2567– 2573. IEEE, 2010. [4] Y. -C. Chen, V. Patel, S. Shekhar, R. Chellappa, and P. Phillips. Video-based face recognition via joint sparse representation. In Automatic Face and Gesture Recognition (FG), 2013 10 th IEEE International Conference and Workshops on, pages 1– 8, 2013. [5] Z. Cui, W. Li, D. Xu, S. Shan, and X. Chen. Fusing robust face region descriptors via multiple metric learning for face recognition in the wild. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3554– 3561, 2013. [6] G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto. Dynamic textures. International Journal of Computer Vision, 51(2): 91– 109, 2003. [7] R. Gross and J. Shi. The cmu motion of body (mobo) database. Technical Report CMU-RI-TR- 01 -18, Robotics Institute, Pittsburgh, PA, June 2001. [8] D. Jiang, Y. Hu, S. Yan, L. Zhang, H. Zhang, and W. Gao. Efficient 3 d reconstruction for face recognition. Pattern Recognition, 38(6): 787– 798, 2005. [9] K. -C. Lee, J. Ho, M. -H. Yang, and D. Kriegman. Video-based face recognition using probabilistic appearance manifolds. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 1, pages I– 313. IEEE, 2003. [10] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. OToole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H. Givens, Y. M. Lui, et al. Overview of the multiple biometrics grand challenge. In Advances in Biometrics, pages 705– 714. Springer, 2009. [11] J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500): 2319– 2323, 2000. [12] P. Turaga, A. Veeraraghavan, A. Srivastava, and R. Chellappa. Statistical computations on grassmann and stiefel manifolds for image and video-based recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(11): 2273– 2286, 2011. [13] R. Wang, S. Shan, X. Chen, and W. Gao. Manifold-manifold distance with application to face recognition based on image set. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1– 8. IEEE, 2008. [14] L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched back- ground similarity. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 529– 534. IEEE, 2011. 25