Estimating 3 D Facial Pose in Video with

  • Slides: 33
Download presentation
Estimating 3 D Facial Pose in Video with Just Three Points Ginés García Mateos,

Estimating 3 D Facial Pose in Video with Just Three Points Ginés García Mateos, Alberto Ruiz García Dept. de Informática y Sistemas P. E. López-de-Teruel, A. L. Rodriguez, L. Fernández Dept. Ingeniería y Tecnología de Computadores University of Murcia - SPAIN UNIVERSITY OF MURCIA (SPAIN) ARTIFICIAL PERCEPTION AND PATTERN RECOGNITION GROUP

Introduction (1/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G.

Introduction (1/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • Main objective: to develop a new method to estimate the 3 D pose of the head of a human user: – Estimation through a video sequence – Working with the minimum necessary information: a 2 D location of the face – A very simple method, without training, running in real-time: fast processing – Under realistic conditions: robust to facial expressions, light, movements – Robustness preferred to accuracy 2

Introduction (2/3) • 3 D pose estimation using 3 D tracking… Active Appearance Model

Introduction (2/3) • 3 D pose estimation using 3 D tracking… Active Appearance Model 3 D morphable mesh ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández http: //www. lysator. liu. se/~eru/research/ Shape & texture models http: //cvlab. epfl. ch/research/body Cylindrical Models 3 DFP’ 2008 ANCHORAGE JUNE, 2008 http: //www. merl. com/projects/3 Dfacerec/ www. cs. bu. edu/groups/ivc/html/research_list. php 3

Introduction (3/3) • In short, we want to obtain something like this: ESTIMATING 3

Introduction (3/3) • In short, we want to obtain something like this: ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • The result is 3 D location (x, y, x), and 3 D orientation (roll, pitch, yaw): 6 D. O. F. 4

Index of the presentation • Overview of the proposed method ESTIMATING 3 D FACIAL

Index of the presentation • Overview of the proposed method ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 – 2 D facial detection and location – 2 D face tracking • 3 D Facial pose estimation – 3 D Position – 3 D Orientation • Experimental results • Conclusions 5

Overview of the Proposed Method ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST

Overview of the Proposed Method ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • The key idea: separate the problems of 2 D tracking and 3 D pose estimation. 2 D Face detection 2 D Face tracking 3 D Pose estimation The proposed 3 D pose estimator could use any 2 D facial tracker • Introducing some assumptions and simplifications, pose is extracted with very little information. 6

2 D Face Detection, Location and Tracking Using I. P. ESTIMATING 3 D FACIAL

2 D Face Detection, Location and Tracking Using I. P. ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández • We use a method based on integral projections (I. P. ), which is simple and fast. • Definition of I. P. : average of gray levels of an image along rows and columns. PVi : [ymin, . . . , ymax] → R PHi : [xmin, . . . , xmax] → R Given by: PVi(y) : = i(·, y) Given by: PHi(x) : = i(x, ·) 0 20 y 40 PH(x) 60 80 3 DFP’ 2008 ANCHORAGE JUNE, 2008 100 i(x, y) 75 100 125 150 175 200 225 PV(y) 100 125 150 175 200 225 20 40 x 60 80 7

2 D Face Detection with I. P. Global view of the I. P. face

2 D Face Detection with I. P. Global view of the I. P. face detector ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Input image PVface Step 2. Step 1. Vertical Horizontal projections by projection of Step 3. strips the candidates Grouping of the candidates Final result PHeyes 8

2 D Face Detection with I. P. ESTIMATING 3 D FACIAL POSE IN VIDEO

2 D Face Detection with I. P. ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS • To improve the results, we combine two face detectors: combined detector. Face Detector 1. Face Detector 2. Final detection Look for candidates Verify face candidates result G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Haar + Ada. Boost [Viola and Jones, 2001] Integral Projections [Garcia et al, 2007] 9

2 D Face Detection with I. P. ROC curves on UMU Face. DB (737

2 D Face Detection with I. P. ROC curves on UMU Face. DB (737 img. /853 faces) 1 % detected faces ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS 0. 8 0. 6 0. 4 detectadas 0. 2 G. García caras A. Ruiz % % 0. 2 0. 4 0. 6 0. 8 1 1. 2 0. 0050. 01 0. 050. 1 0. 5 1 P. E. López r % false positives to A. L. Rodríguez c Int. Proj Haar Neural. Net Tem. Match Cont IP+Haar+IP L. Fernández ete PI T FDe V im. P t. r 2, e. = at 6 G 0. io 5 h D 3 DFP’ 2008 ANCHORAGE JUNE, 2008 84, 2% 91, 8% 85 ms 293 ms 88, 6% 2338 ms 39, 0% 389 ms 24, 8% 88, 6% 96, 1% 120 ms 97 ms 296 ms [Garcia et al, 2007] 10

2 D Face Location with I. P. Global view of the 2 D face

2 D Face Location with I. P. Global view of the 2 D face locator ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS Step 1. Orientation estimation Input image and face 0 50 100 150 200 250 G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 0 0 5 MVface(y) 10 20 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Step 3. Horizontal alignment Step 2. Vertical alignment 30 y 40 50 100 200 60 MHeyes(x) 0 50 100 150 200 250 5 10 10 15 15 y 20 60 100 140 PVeyes(y) 0 0 10 10 y 20 20 30 30 20 20 60 100 140 PV’eyes(y) 50 150 250 PVface(y) PHojos(x) PH’ojos(x) 0 5 10 15 20 25 30 x x Final result 50 150 250 PV’face(y) 10 20 30 40 x 11

2 D Face Location with I. P. Location accuracy of the 2 D face

2 D Face Location with I. P. Location accuracy of the 2 D face locator ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Av. time PIV 2. 6 Gh Int. Proj Neural. Net Eigen. Feat 1, 7 ms 323, 6 ms 20, 5 ms 12

2 D Face Tracking with I. P. Initial face detection&location FACE TRACKING ESTIMATING 3

2 D Face Tracking with I. P. Initial face detection&location FACE TRACKING ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Face relocation Frame t+1 Correct tracking Lost face Step 1. Step 0. -20 0 0 20 20 y 40 40 60 50 60 150 250 50 150 250 PVface(y) PV’face(y) Step 3. Step 2. Vertical alignment Prediction Orientation estimation Horizontal alignment PHeyes(x) 100 150 200 PH’eyes(x) 100 150 200 0 20 40 60 x y G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández Prediction of new position Motion model update 0 20 40 60 x 0 5 10 15 20 25 30 50 150 250 PVeyes(y) 0 5 10 15 20 25 30 50 150 250 PV’eyes(y) 13

2 D Face Tracking with I. P. • Sample result of the proposed tracker.

2 D Face Tracking with I. P. • Sample result of the proposed tracker. ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS (e 1 x, e 1 y) = location of left eye; (e 2 x, e 2 y) = right eye; (mx, my) = location of the mouth G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 320 x 240 pixels, 312 frames at 25 fps, laptop webcam 14

3 D Facial Pose Estimation ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST

3 D Facial Pose Estimation ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS • In theory, 3 points should be enough to solve the 6 degrees-of-freedom (if focal length and face geometry are known). • But… G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • Location errors are high in the mouth for non-frontal faces. • Some assumptions are introduced to avoid the effect of this error. 15

3 D Facial Pose Estimation ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST

3 D Facial Pose Estimation ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS • Fixed body assumption: fixed user’s body, moving the head 3 D position is estimated in the first frame; 3 D orientation in the following frames. G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • A simple perspective projection model is used to estimate 3 D position. 16

3 D Position Estimation p= (px, py, pz) ESTIMATING 3 D FACIAL POSE IN

3 D Position Estimation p= (px, py, pz) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 (0, 0, 0) cx= (e 1 x+e 2 x+mx)/3 cy= (e 1 y+e 2 y+my)/3 • f: focal length (known) • (cx, cy): tracked center of the face 17

3 D Position Estimation ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE

3 D Position Estimation ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • We have: cx/f = px/pz ; cy/f = py/pz • Where: cx= (e 1 x+e 2 x+mx)/3; cy= (e 1 y+e 2 y+my)/3 • So: px= (e 1 x+e 2 x+mx)/3·pz/f py= (e 1 y+e 2 y+my)/3·pz/f • The depth of the face, pz, is computed with: pz= f·t/r, where r is the apparent face size* and t is the real size. * For more information, see the paper. 18.

Estimation of Roll Angle ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE

Estimation of Roll Angle ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS • Roll angle can be approximately associated with the 2 D rotation of the face in the image. roll = arctan e 2 y − e 1 y e 2 x − e 1 x G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández roll = -43, 7º 3 DFP’ 2008 ANCHORAGE JUNE, 2008 roll = -2, 8º roll = 15, 9º roll = 34, 6º • This equation is valid in most practical situations, but it is not precise in all cases. 19

Estimation of Pitch and Yaw ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST

Estimation of Pitch and Yaw ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • The head-neck system can be modeled as a robotic arm, with 3 rotational DOF. ORTHOGRAPHIC VIEW TOP VIEW Y Y yaw FRONT VIEW Y X b c a pitch Z roll b X b Z Z b • In this model, any point of the head lies in a sphere its projection is related to pitch and yaw. X Yi (dx 0, dy 0) (dxt, dyt) Xi ri 20

Estimation of Pitch and Yaw ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST

Estimation of Pitch and Yaw ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández • rw: radius of the sphere where rw= sqrt(a 2+c 2) the center of the eyes lies. • ri: radius of the circle where that ri= rw·f/pz sphere is projected. • (dx 0, dy 0): initial center of eyes. ((e 1 x+e 2 x)/2, (e 1 y+e 2 y)/2) • (dxt, dyt): current center of eyes Yi Yi (dx 0, dy 0) Yi (dx 1, dy 1) Xi ri 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Initial frame pitch= 0, yaw= 0 (dx 2, dy 2) (dx 0, dy 0) Xi ri Instant t = 1 Xi ri Instant t = 2 21

Estimation of Pitch and Yaw ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST

Estimation of Pitch and Yaw ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández • In essence, we have a problem of computing altitude and latitude for a given point in a circle. Y • The center of the circle is: (dxt, dyt) (dx 0, dy 0 − a·f/pz) (dx 0, dy 0) • So we have: ri Xi dyt − (dy 0 − a · f/pz) - arcsin a/c pitch = arcsin i r • And: 3 DFP’ 2008 ANCHORAGE JUNE, 2008 yaw = arcsin dxt − dx 0 ri · cos(pitch + arcsin(a/c)) 22

Experimental Results (1/7) • Experiments carried out: ESTIMATING 3 D FACIAL POSE IN VIDEO

Experimental Results (1/7) • Experiments carried out: ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 – Off-the-shelf webcams. – Different individuals. – Variations in facial expressions and facial elements (glasses). • Studies of robustness, efficiency, comparison with a projection-based 3 D estimation algorithm. • In a Pentium IV at 2. 6 Gh: ~5 ms file reading, ~3 ms tracking, ~0. 006 ms pose estimation 23

Experimental Results (2/7) • Sample input video: bego. a. avi ESTIMATING 3 D FACIAL

Experimental Results (2/7) • Sample input video: bego. a. avi ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 320 x 240 pixels, 312 frames at 25 fps, laptop webcam 24

Experimental Results (3/7) • 3 D pose estimation results ESTIMATING 3 D FACIAL POSE

Experimental Results (3/7) • 3 D pose estimation results ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 320 x 240 pixels, 312 frames at 25 fps, laptop webcam 25

Experimental Results (4/7) Proposed method Projection-based ESTIMATING 3 D FACIAL POSE IN VIDEO WITH

Experimental Results (4/7) Proposed method Projection-based ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS Pitch G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 Proposed method Projection-based 26

Experimental Results (5/7) • Range of working angles… ESTIMATING 3 D FACIAL POSE IN

Experimental Results (5/7) • Range of working angles… ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • Approx. ± 20º in pitch and ± 40º in yaw. • The 2 D tracker is not explicitly prepared for profile faces! 27

Experimental Results (6/7) • With glasses and without glasses ESTIMATING 3 D FACIAL POSE

Experimental Results (6/7) • With glasses and without glasses ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 28

Experimental Results (7/7) • When fixed-body assumption does not hold ESTIMATING 3 D FACIAL

Experimental Results (7/7) • When fixed-body assumption does not hold ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • Body/shoulder tracking could be used to compensate body movement. 29

Conclusions (1/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G.

Conclusions (1/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • Our purpose was to design a fast, robust, generic and approximate 3 D pose estimation method: – Separation of 2 D tracking and 3 D pose. – Fixed-body assumption. – Robotic head model. • 3 D position is computed in the first frame. • 3 D orientation is estimated in the rest of frames. • Estimation process is very simple, and avoids inaccuracies in the 2 D tracker. 30

Conclusions (2/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS •

Conclusions (2/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS • Future work: using the 3 D pose estimator in a perceptual interface. G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 31

Conclusions (3/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G.

Conclusions (3/3) ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • The simplifications introduced lead to several limitations of our system, but in general… • Human anatomy of the head/neck system could be used in 3 D face trackers. • The human head cannot move independently of the body! • Taking advantage of these anatomical limitations could simplify and improve current trackers. 32

Last ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García

Last ESTIMATING 3 D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P. E. López A. L. Rodríguez L. Fernández 3 DFP’ 2008 ANCHORAGE JUNE, 2008 • This work has been supported by the project Consolider Ingenio-2010 CSD 200600046, and TIN 2006 -15516 -C 04 -03. • Sample videos: http: //dis. um. es/~ginesgm/fip • Grupo PARP web page: http: //perception. inf. um. es/ Thank you very much 33