Human Pose Estimation 2018313 Outline Motivation Implementation Deep

  • Slides: 34
Download presentation
Human Pose Estimation 李奎佐 2018/3/13

Human Pose Estimation 李奎佐 2018/3/13

Outline Motivation Implementation Deep. Pose: Human Pose Estimation via Deep Neural Networks Articulated Pose

Outline Motivation Implementation Deep. Pose: Human Pose Estimation via Deep Neural Networks Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos Discussion Reference

Motivation

Motivation

What is Human Pose Estimation ? ? https: //www. youtube. com/watch? v=g. A 3

What is Human Pose Estimation ? ? https: //www. youtube. com/watch? v=g. A 3 ct. Wwj. BNg

Application Action Video Recognition Surveillance Human Computer Interaction High-Level Analysis of Pose

Application Action Video Recognition Surveillance Human Computer Interaction High-Level Analysis of Pose

Implementation

Implementation

Deep. Pose: Human Pose Estimation via Deep Neural Networks Alexander Toshev, Christian Szegedy Google

Deep. Pose: Human Pose Estimation via Deep Neural Networks Alexander Toshev, Christian Szegedy Google CVPR 2014

Architecture Cascade of Pose Regressors(CNN: Alex. Net) Input Full Image CNN (Initial stage) CNN

Architecture Cascade of Pose Regressors(CNN: Alex. Net) Input Full Image CNN (Initial stage) CNN (Second stage) …… CNN (Stage s-1) CNN (Stage s) Output Body Joints Position

Model y: Pose Vector yi: x and y coordinates of the Loss function joint

Model y: Pose Vector yi: x and y coordinates of the Loss function joint

Dataset FLIC(Frames Labeled In Cinema): 4000 training and 1000 test images obtained from popular

Dataset FLIC(Frames Labeled In Cinema): 4000 training and 1000 test images obtained from popular Hollywood Movies 10 upper body joints are labeled LSP(Leeds Sports Dataset): 11000 training and 1000 testing images from sports activities 14 full body joints are labeled

Metrics PCP (Percentage of Correct Parts) ground truth predicted and PDJ (Percent of Detected)

Metrics PCP (Percentage of Correct Parts) ground truth predicted and PDJ (Percent of Detected) detected if the distance between the predicted and the true joint is within a certain fraction of the torso diameter

Result PDJ on FLIC dataset Red: predicted poses Green: ground truth poses

Result PDJ on FLIC dataset Red: predicted poses Green: ground truth poses

Result PCP on LSP dataset PDJ on FLIC dataset

Result PCP on LSP dataset PDJ on FLIC dataset

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos Jie Song, Otmar

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos Jie Song, Otmar Hilliges Advanced Interactive Technologies, ETH Zurich Limin Wang, Luc Van Gool Computer Vision Laboratory, ETH Zurich CVPR 2017

What is the challenging issues in the video-based case ? ? Self-occlusion Motion Blur

What is the challenging issues in the video-based case ? ? Self-occlusion Motion Blur Uncommon Poses (a): Ground Truth (b): Regress Body-part Locations (No Spatial Inference) (c): Spatial Inference

Architecture G=(V, E) (b): CNN: Convolutional Pose Machines(CVPR 2016) (d): Flow Warp: compute dense

Architecture G=(V, E) (b): CNN: Convolutional Pose Machines(CVPR 2016) (d): Flow Warp: compute dense optical flow between neighboring frames to propagate joint position estimates through time. (e): A Spatio-temporal Inference Layer

What is the challenging issues in the video-based case ? ? Self-occlusion Motion Blur

What is the challenging issues in the video-based case ? ? Self-occlusion Motion Blur Uncommon Poses (a): Ground Truth (b): Regress Body-part Locations (No Spatial Inference) (c): Spatial Inference (d): Spatio-Temporal Inference

Model Single Image A Video Sequence V: Vertices E: Edges

Model Single Image A Video Sequence V: Vertices E: Edges

Training First Stage: Training fully convolutional layers Loss function: Second Stage: Training with flow

Training First Stage: Training fully convolutional layers Loss function: Second Stage: Training with flow warping and inference layers Loss function: is an indicator which is equal to 1 if the pixel lies within a circle of radius r centered on the ground truth joint position, otherwise it is equal to -1.

Dataset and Metric Penn Action dataset: 2326 unconstrained videos 15 different action categories 13

Dataset and Metric Penn Action dataset: 2326 unconstrained videos 15 different action categories 13 human joints for each image PCK:

Result PCK: All parts Head Wrists Hips Shoulders Knees Elbows Ankles

Result PCK: All parts Head Wrists Hips Shoulders Knees Elbows Ankles

Result

Result

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations Xianjie Chen,

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations Xianjie Chen, Alan Yuille University of California, Los Angeles NIPS 2014

Idea

Idea

Architecture

Architecture

Model Unary Terms: Pairwise Terms: Full Score G=(V, E) V: Vertices E: Edges

Model Unary Terms: Pairwise Terms: Full Score G=(V, E) V: Vertices E: Edges

Model Unary Terms: Pairwise Terms: Full Score Unary Terms: G=(V, E) V: Vertices E:

Model Unary Terms: Pairwise Terms: Full Score Unary Terms: G=(V, E) V: Vertices E: Edges Pairwise Terms:

Result PCP on LSP dataset

Result PCP on LSP dataset

Result PCP on LSP dataset

Result PCP on LSP dataset

Result PDJ on FLIC dataset

Result PDJ on FLIC dataset

Discussion

Discussion

Discussion Long-Range Temporal Dependencies Handling of Groups of People 3 D Pose Estimation High-Level

Discussion Long-Range Temporal Dependencies Handling of Groups of People 3 D Pose Estimation High-Level Analysis of Pose

Reference

Reference

 1. Deep. Pose: Human Pose Estimation via Deep Neural Networks 2. Articulated Pose

1. Deep. Pose: Human Pose Estimation via Deep Neural Networks 2. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations 3. Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos