3 D HUMAN BODY POSE ESTIMATION BY SUPERQUADRICS

3 D HUMAN BODY POSE ESTIMATION BY SUPERQUADRICS Ilya Afanasyev, Massimo Lunardelli, Nicolo' Biasi, Luca Baglivo, Mattia Tavernini, Francesco Setti and Mariolino De Cecco EU-FP 7 -Marie Curie COFUNDTrentino Project N° 226070 26/02/2012 Department of Mechanical and Structural Engineering (DIMS), Mechatronics Lab. 1/21

Content 1. 2. 3. 4. 5. 26/02/2012 Introduction The input data description The algorithm description Demo of Test Results Conclusions 2/21

Introduction We present the 3 D reconstruction and human body pose estimation system using Superquadrics (SQ) math. model and RANSAC search with a least square fitting & verifying algorithms. Input data from VERITAS project Video-frames from multicamera system 3 D point cloud 26/02/2012 Human Body pose estimation algorithm under consideration Final Human Body pose model Preprocessing: segmentation Fitting SQ to 3 D data 3/21

Our starting point We use the multiple stereo system (8 pairs of cameras) and the garment with the special clothing marks to recover 3 D human body surface with superimposed colored markers. 26/02/2012 The multicamera system and garment belong to EUFP 7 -ICT – VERITAS project: http: //veritas-project. eu/ 4/21

Segmentation The segmentation is based on clothing analysis (i. e. recognition of the special clothing marks on the garment) and divides the Human Body into 9 parts (body, arms, forearms, hips and legs). The garment doesn’t have a hood, so our Human Body SQ-model doesn’t have the head. 26/02/2012 The multicamera system and garment belong to EUFP 7 -ICT – VERITAS project: http: //veritas-project. eu/ 5/21

What is the input data? à 3 D video of Human Body movement has been captured from a multicamera system and consisted of 119 frames. à 3 D data processed offline separately for every frame and concludes 3 D coordinates of appr. 2100 datapoints of the Human Body pose. à 3 D data points are accompanied with segmentation matrix, the elements of which set belonging of every point to the body or definite limb. As the result of the clothing segmentation we have approximately 800 datapoints of the body, 30 -70 points of left/right arms, 15 -25 points of forearms, 300 -600 points of hips, and 80 -150 points of legs. 26/02/2012 The multicamera system and garment belong to EUFP 7 -ICT – VERITAS project: http: //veritas-project. eu/ 6/21

What is the proposed method? We propose using the hierarchical RANSAC-based model-fitting technique with a composite SQ model of human body (HB) and limbs. SQ models permit to describe complex-geometry objects with few parameters and generate simple minimization function to estimate an object pose. We assume shape and dimensions of the body and limbs are known a-priori with correct anthropometric parameters in the metric coordinate system. The algorithm recovers 3 D position of the body as the largest object (“Body Pose Search”) and then restores the human limbs poses (“Limbs Pose Search”). To cope with measurement noise and outliers, the object pose is estimated by RANSAC-SQ-fitting technique. We control the fitting quality by setting inlier thresholds for limbs and body. 26/02/2012 7/21

HB Pose Estimation algorithm Threshold for body 55% Threshold for limbs 60% 26/02/2012 8/21

Human Body model in Superquadrics We present Human Body (HB) as a model in 9 superquadrics – superellipsoids. HB anthropometric parameters: the shape parameters ε 1 = ε 2 = 0. 5; the scaling parameters: → Body: a 1 = 0. 095, a 2 = 0. 18, a 3 = 0. 275 (m). → Arms: a 1 = a 3 = 0. 055, a 2 = 0. 15 (m). → Forearms: a 1 = a 3 = 0. 045, a 2 = 0. 13 (m). → Hips: a 1 = a 2 = 0. 075, a 3 = 0. 2 (m). → Legs: a 1 = a 2 = 0. 05, a 3 = 0. 185 (m). 26/02/2012 Abbreviations: B – body, LA/RA – Left/Right Arms, LF/RF – Left/Right Forearms, LH/RH – Left/Right Hips, LL/RL – Left/Right Legs. LS – Left Shoulder, E – Elbow, LHJ – Left Hip Joint, ηLA – angle position of Left Shoulder, K – Knee, etc. 9/21

Human Body model in Superquadrics The explicit form of the parametric equation of the superquadrics, which is usually used for SQ representation and visualization, is The implicit equation of superquadrics is used for mathematical modeling to do fitting 3 D data: where x, y, z - superquadric system coordinates; η, ω – spherical coordinates; a 1, a 2, a 3 – the scaling parameters; a 4, a 5 – the shape parameters. 26/02/2012 10/21

Body model in Superquadrics The position of Human Body is defined by the following rotation & translation sequences of the Body Superquadrics: 1. Translation of center of BODY (xc, yc, zc), along x, y, z-coordinates. 2. Rotation α among x (clockwise). 3. Rotation β among y (clockwise). 4. Rotation γ among z (clockwise). The rotation matrix of BODY RBODY is: The transformation matrix of BODY RBODY is: 26/02/2012 11/21

Limb models in Superquadrics The position of Left Shoulder according to the center of the body coordinate system is estimated by SQ explicit equation: The transformation Left Shoulder - Left Arm (LS-LA) can be expressed with the following rotation & translation sequences: 1. Rotation α among x (clockwise). 2. Rotation β among z (anticlockwise). 3. Rotation γ among y (clockwise). 4. Translation of SQ center on distance a 2 along y. where RLA is the rotation matrix of Left Arm 26/02/2012 12/21

Limb models in Superquadrics The full transformation for every point of system “Body - Left Forearm” (B-LF) can be calculated this way: where PB, PLF - coordinates of Body and Left Forearm points correspondingly. The transformation Elbow - Left Forearm (E-LF) is created by: 1. Rotation δLF among x (clockwise). 2. Translation of SQ center on -a 2 along y. The transformations: Body - Left Shoulder (B-LS) and Left Arm Elbow (LA-E) are: 26/02/2012 13/21

RANSAC Body Pose Search We use RANSAC ("RANdom SAmple Consensus") algorithm to find the body pose hypothesis, i. e. 6 variables: 3 angles of rotation (α, β, γ) and 3 translation coordinates (x. C, y. C, z. C). Having these variables we can calculate the transformation matrix TBODY. We are fitting a model described by the superquadric implicit equation to 3 D data of the body. We are taking 6 points in the world coordinate system (x. Wi, y. Wi, z. Wi) from appr. 800 data points of the body and transform them to the SQ centered coordinate system (x. Si, y. Si, z. Si), using Then we are calculating the inside-outside function according to the superquadric implicit equation in world coordinate system: 26/02/2012 14/21

RANSAC Body Pose Search The inside-outside function has 11 parameters: where 5 parameters are known (a 1, a 2, a 3, ε 1, ε 2) and 6 parameters (α, β, γ, x. C, y. C, z. C) should be found by minimizing the cost-function: Thus we are fitting SQ model to random dataset by minimizing the inside-outside function of distance to SQ surface. We used both the “Trust-Region algorithm” and “Levenberg-Marquardt algorithm” in the nonlinear least-square minimization method. After that we are evaluating number of inliers by comparing the distances between every point of 3 D point cloud and SQ model with a distance threshold t (to accelerate the calculations we took the distance threshold t = 2 cm): 26/02/2012 15/21

RANSAC Limb Pose Search Analogically to Body Pose search, we are realizing RANSAC Limb Pose Search. The main differences between RANSAC Body and Limb Fitting are: in using SQ pairs of limbs: arm-forearms, hips-legs. in picking up s = 3 points for every limb (although we use the body transform matrix TBODY, obtained from the Body Pose Search). in using 4 variables for Limbs Pose Search: 4 angles of rotation (α, β, γ, δ). in minimizing the joint cost-function of SQ pair together, considering two limbs simultaneously: where abbreviations LA and LF mean Left Arm (LA) and Forearm (LF) Limbs correspondingly (as an example). 26/02/2012 16/21

Demo of Test Results For most of 3 D video frames, the amount of inliers is more than 65%. 26/02/2012 At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3 D data (pink points – inliers, cyan – outliers), right – final pose estimation. 17/21

Demo of Test Results For most of 3 D video frames, the amount of inliers is more than 65%. 26/02/2012 At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3 D data (pink points – inliers, cyan – outliers), right – final pose estimation. 18/21

Demo of Test Results The lack of data points for arms and forearms gives the displacements of the upper limb poses from one video frame to other. It spoils the impression from the Human Body movement when preparing video collecting together the individual frames processed by RANSAC-SQ-fitting. This problem can be solved in future by correcting 3 D Human Body Pose Estimation algorithm, or improving 3 D data point acquisition process, or using other sensor and segmentation techniques. 26/02/2012 19/21

Conclusions à 3 D real data of Human Body was obtained by a multi-camera system and structured by the special clothing analysis. à The human body was modeled by a composite Super. Quadric (SQ) model presenting body and limbs with correct a-priori known anthropometric dimensions. à The proposed method based on hierarchical RANSAC-object search with a robust least square fitting SQ model to 3 D data: at first the body, then the limbs. à The solution is verified by evaluating the matching score (the number of inliers corresponding to a-piori chosen distance threshold), and comparing this score with admissible inlier threshold for the body and limbs. àFor most of 3 D video frames, we achieve the amount of inliers is more than 65% that means that algorithm works well. à This method can be useful for applications dealt with 3 D Human Body recognition, localization and pose estimation. à This method will also work with any 3 D point cloud data acquired by other sensors and segmented using any other algorithms. 26/02/2012 20/21

Acknowledgements Ilya Afanasyev worked under creation of the algorithms for 3 D object recognition and pose estimation by support from EUFP 7 -Marie Curie-COFUND – Trentino program. 3 D data acquisition and segmentation were executed by Uni. TN team in the framework of project VERITAS funded by FP 7, EU. The authors are very grateful to colleagues from Mechatronics dep. , University of Trento (Uni. TN), namely Alberto Fornaser. Grazie!! 26/02/2012 21/21