Automatic Feature Extraction for Multiview 3 D Face

  • Slides: 29
Download presentation
Automatic Feature Extraction for Multi-view 3 D Face Recognition Xiaoguang Lu Anil K. Jain

Automatic Feature Extraction for Multi-view 3 D Face Recognition Xiaoguang Lu Anil K. Jain Michigan State University

OUTLINE l INTRODUCTION l Feature Extraction l Automatic Face Recognition l Experiments and Discussion

OUTLINE l INTRODUCTION l Feature Extraction l Automatic Face Recognition l Experiments and Discussion l Conclusions and Future Work

INTRODUCTION In both 2 D and 3 D face recognition systems, registration based on

INTRODUCTION In both 2 D and 3 D face recognition systems, registration based on feature point correspondence is one of the most popular methods. l Facial features can be of different types: region, landmark, and contour. l Generally, landmarks provide more accurate and consistent representation for alignment purposes than region-based features and have lower complexity and computational burden than contour feature extraction. l

INTRODUCTION Range image based 3 D face recognition has been demonstrated to be effective

INTRODUCTION Range image based 3 D face recognition has been demonstrated to be effective in enhancing the face recognition accuracy. l Since each range image provides only a single view point of the face, instead of the full 3 D view, we use 3 D face models to recognize 2. 5 D face (range) images. l

INTRODUCTION We have focused on automatically extracting feature points and estimating the head pose

INTRODUCTION We have focused on automatically extracting feature points and estimating the head pose in the presence of large pose variations. l The extracted features are used for face alignment in threedimensional space. l Overview of the system

Feature Extraction l Face Segmentation l Nose Tip and Pose Estimation Pose quantization l

Feature Extraction l Face Segmentation l Nose Tip and Pose Estimation Pose quantization l Directional maximum l Pose correction l Nose profile extraction l Nose profile identification l l Extracting Eye and Mouth Corners

Face Segmentation Each 2. 5 D scan provides 4 matrices, X(r , c), Y(r

Face Segmentation Each 2. 5 D scan provides 4 matrices, X(r , c), Y(r , c), Z(r , c), and M(r , c), where X, Y , and Z are the spatial and depth coordinates in the units of millimeters and M is the mask, indicating which point is valid; M(r , c) equals 1 if the point p(r , c) is valid and 0 otherwise. l Given a facial scan, the invalid points in X, Y , and Z are filtered out by matrix M. The facial area is segmented by thresholding the horizontal and vertical integral projection curves of M. l

Face Segmentation Figure 3. Segmentation of facial scan. (a) Mask image; (b) horizontal integral

Face Segmentation Figure 3. Segmentation of facial scan. (a) Mask image; (b) horizontal integral projection of M; (c) vertical integral projection of M; (d) face segmentation result.

Nose Tip and Pose Estimation The nose tip is a distinctive point of the

Nose Tip and Pose Estimation The nose tip is a distinctive point of the human face, especially in the range map. It is also insensitive to the facial expression changes. l The nose tip has the largest depth value if projected onto the corrected pose direction. We call it the directional maximum. Since the nose tip and the pose angle are coupled, we estimate them simultaneously. l

Pose quantization l The yaw angle change ranges from -90 degrees (full right profile)

Pose quantization l The yaw angle change ranges from -90 degrees (full right profile) to 90 degrees (full left profile) in the X-Z plane. This 180 degree range (Rpose) is quantized into Npose angles with equal angular interval (ΔΘ).

Directional maximum l At each pose angle Θj (j =1, …, Npose), find the

Directional maximum l At each pose angle Θj (j =1, …, Npose), find the point with the maximum projection value along the corresponding pose direction as the nose tip candidate. The (xi, yi, zi) coordinate of each face point pi (i = 1, …, N, where N is the total number of valid face points) is rotated to the new position according to the pose angle Θj , calculated as

Directional maximum The point pk for which = max( , i = 1, …,

Directional maximum The point pk for which = max( , i = 1, …, N) is used as a nose tip candidate with the corresponding pose angle Θj. By repeating this for every Θj, M candidate pairs are obtained. l Since the directional maximum may happen with the same face point p at multiple Θj s, M <= Npose. l

Pose correction l For each candidate pair (p, Θ), the coordinates (x, y, z)

Pose correction l For each candidate pair (p, Θ), the coordinates (x, y, z) of all the original face points are transformed to (x’ , y’ , z’) so that point p is at the origin, and the face points are rotated according to the pose angle Θ as follow:

Nose profile extraction From the pose-corrected scans based on each candidate (p, Θ), extract

Nose profile extraction From the pose-corrected scans based on each candidate (p, Θ), extract the nose profile at p, the intersection between the facial surface and the Y -Z plane. l Let X’(r , c), Y’(r , c), and Z’(r , c) denote the point coordinate matrices after pose correction. l For each row ri, find the point closest to the Y -Z plane, i. e. , (ri, ci) = arg minc( |X’( ri , c)|), resulting in a sequence of point pairs (Y’(ri , ci), Z’(ri , ci)). l

Nose profile extraction To make all the profiles comparable, each profile is normalized by

Nose profile extraction To make all the profiles comparable, each profile is normalized by centering it at the nose tip candidate and re-sampling it with equal interval along Y -axis, resulting in a nose profile vector. l Linear interpolation is applied for re-sampling. l

Nose profile identification To identify the nose tip from the candidate pairs (p, Θ),

Nose profile identification To identify the nose tip from the candidate pairs (p, Θ), we apply the subspace analysis on the nose profile vector space. l A number of nose profiles from an independent group of subjects are extracted with manually labeled nose tip and pose, aligned at the nose tip, and re-sampled, resulting in a training set {Vi}. l These (training) nose profiles are used to construct the nose profile subspace based on PCA. l

Nose profile identification Given a test profile vector V , the distance-fromfeature-space (DFFS) is

Nose profile identification Given a test profile vector V , the distance-fromfeature-space (DFFS) is used as the distance metric. l The nose tip candidates with the smallest DFFS is identified as the nose tip and the associated pose angle is used as the pose estimation result. l

Extracting Eye and Mouth Corners l A statistical model of the facial feature point

Extracting Eye and Mouth Corners l A statistical model of the facial feature point locations is used as a prior constraint to reduce the search area for the feature points. The model contains constraints between facial feature points. This not only enhances the accuracy of feature extraction, but also improves the computational efficiency.

Extracting Eye and Mouth Corners The eye and mouth corners are then determined by

Extracting Eye and Mouth Corners The eye and mouth corners are then determined by using shape index from the range image and cornerness from the intensity image. l The final score F(p) for each feature point is a weighted sum of scores from range and intensity modalities. l

Extracting Eye and Mouth Corners l Shape Index (range): l The shape index takes

Extracting Eye and Mouth Corners l Shape Index (range): l The shape index takes a value in the interval [0 , 1]. The corners of the eyes and the mouth are in a cup-like shape with low shape index values.

Extracting Eye and Mouth Corners l Cornerness (intensity): We applied the Harris corner detector

Extracting Eye and Mouth Corners l Cornerness (intensity): We applied the Harris corner detector , based on the fact that intensity changes in a local neighborhood of a corner point along all the directions should be large. l If the two eigenvalues of H are large, then a small motion in any direction will cause a significant change of gray level. This indicates that the point p is a corner. l The stronger the corner response C(p), the more likely the point p is a corner. l

Extracting Eye and Mouth Corners l Both S(p) and C(p) are normalized. where {Si}

Extracting Eye and Mouth Corners l Both S(p) and C(p) are normalized. where {Si} {Ci} is the set of shape index values for each feature point in the search region. l The final score F(p) is computed by integrating scores from the two modalities using the sum rule F(p) = (1 – S’(p)) + C’(p) l The point with the highest F(p) in each search region is identified as the corresponding feature point. l

Automatic Face Recognition A fully automatic 3 D face recognition system has been developed,

Automatic Face Recognition A fully automatic 3 D face recognition system has been developed, using the nose tip position and the corresponding pose angle, the pose of the test face scan be normalized up to a rigid transformation, i. e. , translation and rotation. l An iterative closest point (ICP) scheme is applied to further improve the alignment results. l

Automatic Face Recognition l ICP algorithm: 1. Select control points in one point set.

Automatic Face Recognition l ICP algorithm: 1. Select control points in one point set. 2. Find the closest points in the other point set (correspondence). 3. Calculate the optimal transformation between the two sets based on the current correspondence. 4. Transform the points; repeat step 2, until convergence.

Experiments and Discussion l There are 100 subjects in the MSU database with corresponding

Experiments and Discussion l There are 100 subjects in the MSU database with corresponding 100 3 D face models stored in the gallery. For each subject, three 2. 5 D scans were captured with pose angles of less than -45, 0 (frontal), and more than 45 degrees along the yaw direction, respectively.

Experiments and Discussion The UND database contains 953 facial scans from 277 subjects. l

Experiments and Discussion The UND database contains 953 facial scans from 277 subjects. l Unlike the MSU database, there is no 3 D face model stored in the gallery for each subject. l

Table 1. Statistics of the distance (in 3 D) between the automatically extracted and

Table 1. Statistics of the distance (in 3 D) between the automatically extracted and manually labeled feature points. (distance between two pixels in x and y directions is 1 mm. )

Conclusions and Future Work Conclusion: l We have proposed an automatic feature extraction scheme

Conclusions and Future Work Conclusion: l We have proposed an automatic feature extraction scheme to locate the nose tip and estimate the head pose, along with other facial feature points in multiview 2. 5 D facial scans. l The extracted features are used to align the multi-view face scans with stored 3 D face models to conduct surface matching.

Conclusions and Future Work: l Self-occlusion. l Various factors make the nose tip not

Conclusions and Future Work: l Self-occlusion. l Various factors make the nose tip not the global maximum, such as beard, hair , other objects in the view…etc. (already been solved) l The computational cost to handle the entire 3 D space including three directions (i. e. , yaw, pitch, and roll) would be expensive using brute force search. Therefore, a more efficient search scheme is being pursued.