The Recognition of Human Movement Using Temporal Templates

  • Slides: 36
Download presentation
The Recognition of Human Movement Using Temporal Templates Liat Koren

The Recognition of Human Movement Using Temporal Templates Liat Koren

Lecture subjects • Introduction • Prior work • The Temporal Templates • Usage example

Lecture subjects • Introduction • Prior work • The Temporal Templates • Usage example -2 -

Introduction • Computer vision trends – Less image or camera motion – More on

Introduction • Computer vision trends – Less image or camera motion – More on labeling of action • Reasons – More computational power – Wireless application – Interactive environments -3 -

Introduction – cont. • Recent efforts are in Three Dimensional object reconstruction – Assuming

Introduction – cont. • Recent efforts are in Three Dimensional object reconstruction – Assuming it will have to be used in the recognition of human motion. • This article claims otherwise – View-based approach – Direct recognition -4 -

Motivating Example -5 -

Motivating Example -5 -

Motivating Example -6 -

Motivating Example -6 -

Motivating Example • Static pictures – Hard to recognize. • Sequence on motion –

Motivating Example • Static pictures – Hard to recognize. • Sequence on motion – Human can recognize without three dimensional reconstruction. • Conclusion – It is possible to recognize movement using only the motion itself. -7 -

 • Process – Recover the pose of the person at each time instant

• Process – Recover the pose of the person at each time instant using a 3 D model. – The model’s projected image should be as close as possible to the object (e. g. edges of body in the image) • Drawbacks – Complicated process – Human interference is usually required – Special imaging environment -8 -

2 D Based recognition • Action is a sequence of static poses of object.

2 D Based recognition • Action is a sequence of static poses of object. • Requires – Normalization – Removal of background -9 -

Wilson and Bobik’s approach • Actions are usually hand gestures • Representation – Actual

Wilson and Bobik’s approach • Actions are usually hand gestures • Representation – Actual image – Grayscale – No background • Benefits: – Hand appearance is fairly similar over a wide range of people • Problems – Actions that include the appearance of the whole body are not visually consistent across different people. - 10 -

Yamato’s et al. approach • Representation – No background – Black and white silhouettes

Yamato’s et al. approach • Representation – No background – Black and white silhouettes • Matching – Vector quantize – Usage of a mathematical method • Benefits – Help handling the variability between people • Problems – Disappearance of movement inside the silhouette - 11 -

Summery of prior work • Action is a sequence of static poses. • Requires

Summery of prior work • Action is a sequence of static poses. • Requires individual features or properties that can be extracted and tracked from each frame. • Recognition of movement from a sequence of images is a complicated task. • Usually requires previous recognition and segmentation of the person. - 12 -

Motion based recognition • Attempt to characterize the motion itself without reference to the

Motion based recognition • Attempt to characterize the motion itself without reference to the underlying static poses of the body. • Possible approaches – Blob like representation – Tracking of predefined regions (e. g. , legs, head, mouth) using motion. • Face expression patches • Whole body patches – Measure typical patterns of muscle activation - 13 -

Terms • Movement – where – motion has occurred in image sequence. + •

Terms • Movement – where – motion has occurred in image sequence. + • MEI – Motion Energy Image – how – the motion is moving. • MHI – Motion History Image Temporal Templates - 14 -

Temporal Templates • Representation of movement – View specific – Movement is motion in

Temporal Templates • Representation of movement – View specific – Movement is motion in time – Vector image that can be matched against stored representations of movements. • Assumptions – Background is static – Camera movements can be removed – Motion of irrelevant objects can be eliminated - 15 -

Motion-Energy Images where did the movement occurred …. - 16 -

Motion-Energy Images where did the movement occurred …. - 16 -

Motion-Energy Images • Notice that: – If τ is very big, all the differences

Motion-Energy Images • Notice that: – If τ is very big, all the differences are accumulated – Τ has a vast influence on the temporal representation of a movement. - 17 -

Motion-Energy Images • Smooth change in the viewing angle causes a smooth change in

Motion-Energy Images • Smooth change in the viewing angle causes a smooth change in the viewed image, thus coarse sampling of the viewing circle is enough (30°) - 18 -

Motion-History Images • Intensity of a pixel represents the temporal history in that pixel.

Motion-History Images • Intensity of a pixel represents the temporal history in that pixel. • Newer movement is brighter. - 19 -

Motion-History Images • A time-window of size τ is used – movement older than

Motion-History Images • A time-window of size τ is used – movement older than τ is ignored. One may wonder, why not use only • The results of the article uses a. MHI simple ? Answers will be replacement and decay operator: given later… Notice that MEI can be calculated out of MHI by painting in white any non-black pixel - 20 -

MEI and MHI in a nutshell • MEI and MHI are two vector images

MEI and MHI in a nutshell • MEI and MHI are two vector images designed to encode a variety of motion properties. • Benefits in this representation is that the calculation is recursive, thus only up-todate information need to be stored, making the computation both fast and space efficient. - 21 -

Matching Temporal Templates • Collect training examples of each movement from a variety of

Matching Temporal Templates • Collect training examples of each movement from a variety of viewing angles. • Compute statistical representation of the MHI/MEI images (Hu moments) • Given an input movement: – Calculate a statistical representation – Use mahalanobis distance to find a stored movement, that is the nearest to the input. - 22 -

Mahalanobis Distance Example - 23 -

Mahalanobis Distance Example - 23 -

Reasoning for the algorithm • Mahanobis distance provides: – Good matching as shown in

Reasoning for the algorithm • Mahanobis distance provides: – Good matching as shown in the results of the article. – Simple calculation which makes real-time applications feasible. • Hu moments allow representation of images, that is invariant to scale or translation. One problem with Hu moments is that: “Hu moments are difficult to reason about intuitively” (the authors) - 24 -

Testing the system 18 exercises performed by experienced aerobic instructor. MEIs are on the

Testing the system 18 exercises performed by experienced aerobic instructor. MEIs are on the bottom rows. - 25 -

Why both MHI and MEI ? Because MHI and MEI perceive two different characteristics

Why both MHI and MEI ? Because MHI and MEI perceive two different characteristics of the movement (the “where” and the “how”) they look different , and thus, both essential. - 26 -

First experiment • Input 30° left of the subject • Match against all seven

First experiment • Input 30° left of the subject • Match against all seven views of all 18 moves • 12 out of 18 are correctly recognized 30° 60° 90° 120° 150° 0° 180° - 27 -

Analyze the results of 1 st exp. false correct input Move 13 in 30

Analyze the results of 1 st exp. false correct input Move 13 in 30 ° Move 6 in 0 ° - 28 - The correct match

Combining multiple views • Two cameras with orthogonal views • Minimize the sum of

Combining multiple views • Two cameras with orthogonal views • Minimize the sum of the mahalanobis distance between the two input templates and two stored views of movement that have 90° between them. • Hidden assumption: we know the angular relationship between the cameras. - 29 -

Second Experiment • Input with two cameras: • 30° left of the subject •

Second Experiment • Input with two cameras: • 30° left of the subject • 60° right of the subject • Match against all seven views of all 18 moves • 15 out of 18 are correctly recognized 120° 30° 60° 90° 150° 0° - 30 -

Analyze the results of false nd 2 exp. correct input Move 16 Move 15

Analyze the results of false nd 2 exp. correct input Move 16 Move 15 - 31 - The correct match

Segmentation and Recognition • Problem : speed of performance is different among different people.

Segmentation and Recognition • Problem : speed of performance is different among different people. • Solution: Segmentation – When training the system, calculate τmax and τmin for each movement. – Use algorithm to match over a wide range of τ. - 32 -

Problems • Problems with current system – One person partially occludes another • Solution:

Problems • Problems with current system – One person partially occludes another • Solution: Use several cameras – More than one person appears in the view point • Solution: use a tracking bounding box - 33 -

More Problems • Motion of part of the body is not specified during a

More Problems • Motion of part of the body is not specified during a movement – Possible solutions • Automatically mask away regions of this type of motion • Always include them • Camera motion – Rather easy to eliminate since camera motion is limited. • Person is performing the movement while locomotion - 34 -

The Kids. Room: An Application • room is aware of the children (at most

The Kids. Room: An Application • room is aware of the children (at most 4) • The room takes the children to a story. • The room’s reaction is influenced by the actions of the children. • Current story : adventurous tour to monster land • In the last scene the monsters teach the children to dance. • Then, the monsters follow the children if they perform movements they “know” • The narration coerces the children to room locations where occlusions is not a problem - 35 -