Towards Perceptually Realistic Talking Heads Models Methods and

  • Slides: 35
Download presentation
Towards Perceptually Realistic Talking Heads: Models, Methods and Mc. Gurk David Marshall, Darren Cosker

Towards Perceptually Realistic Talking Heads: Models, Methods and Mc. Gurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer Science Susan Paddock and Simon Rushton Cardiff School of Psychology Cardiff University APGV 04

Context: A Talking Head • • • APGV 04 Development of a Video-Realistic Talking

Context: A Talking Head • • • APGV 04 Development of a Video-Realistic Talking Head Animation from Continuous Speech Perceptual Analysis -> Realism

Contribution of this Paper: Perceptual Realism Test • • • APGV 04 Perceptual Analysis

Contribution of this Paper: Perceptual Realism Test • • • APGV 04 Perceptual Analysis via Mc. Gurk Test Perceptual Test with no prior bias Used to improve talking head synthesis

Outline of Talk • • • APGV 04 Video Realistic Talking Head (Overview) Perceptual

Outline of Talk • • • APGV 04 Video Realistic Talking Head (Overview) Perceptual Analysis and Testing The Mc. Gurk Effect + Mc. Gurk Test Results : Implications of Mc. Gurk Conclusions + Future Work

Our Talking Head • • • Image based synthesis Continuous Speech Flexible framework –

Our Talking Head • • • Image based synthesis Continuous Speech Flexible framework – emotion, behaviour BASIC IDEA: • Train on input video and audio • • • Synthesise new video using only input audio • • APGV 04 Extracting only low level image and audio features No phonetic labelling Unseen utterances Speaker Independent

Hierarchical Facial Model • Active Appearance Models – Control of shape and texture usingle

Hierarchical Facial Model • Active Appearance Models – Control of shape and texture usingle ‘appearance parameter’ • • • APGV 04 Based on Principal Component Analysis (PCA) Non-linear Hierarchical PCA (developed at Cardiff) Greater Separation of Variation High Degree of Control – Sub -Facial variation not orthogonal in standard PCA model Coupling of Speech Model (Cardiff Idea)

Building A Talking Head Initialisation For Each Video Frame Extract: • Shape – Key

Building A Talking Head Initialisation For Each Video Frame Extract: • Shape – Key Landmark Points (Tracker Helps) • Textures – Colour Pixel Values Normalised to Shape • Speech Features – Mel-Cepstral, Linear Predictive Coding (LPC) APGV 04

Building A Talking Head - Tracking Semi Automated • Hand Place Few Frames •

Building A Talking Head - Tracking Semi Automated • Hand Place Few Frames • Build Interim Shape Model • Track Other Frames • Build Final Shape Model APGV 04

Building A Talking Head Learning/Model Building Active Appearance Model (AAM)-> Shape (PCA) and Texture

Building A Talking Head Learning/Model Building Active Appearance Model (AAM)-> Shape (PCA) and Texture (PCA) Speech/Appearance Model (SAAM NEW) -> Speech (PCA) and AAM Nonlinear PCA: • Gaussian Mixture Model (GMM) Model of Dynamics: • Hidden Markov Model (HMM) APGV 04

Building A Talking Head Synthesis + Reconstruction Input Speech -> Extract Speech Features +

Building A Talking Head Synthesis + Reconstruction Input Speech -> Extract Speech Features + Find Best Clusters Bottom up reconstruction: Mouth Driven APGV 04

Talking Head Examples APGV 04

Talking Head Examples APGV 04

Talking Head Example: Independent Speaker APGV 04

Talking Head Example: Independent Speaker APGV 04

Perceptual Analysis of Talking Heads Current Talking Head Analysis Methods • • APGV 04

Perceptual Analysis of Talking Heads Current Talking Head Analysis Methods • • APGV 04 Subjective Evaluation Analyse and Compare Trajectories Improved Perception in Noisy environments Forced Choice Testing

Perceptual Analysis of Talking Heads Subject and Trajectory Evaluation • Subjective Evaluation • •

Perceptual Analysis of Talking Heads Subject and Trajectory Evaluation • Subjective Evaluation • • Analyse and Compare Trajectories • • • APGV 04 Does it “look good”? No formative comparison No feedback to improve model Ground truth quantitative assessment Comparison to “seen” data No perceptual quality measurement

Perceptual Analysis of Talking Heads: Noisy Environment Evaluation • Noisy Environment Evaluation Perceptual Evaluation

Perceptual Analysis of Talking Heads: Noisy Environment Evaluation • Noisy Environment Evaluation Perceptual Evaluation • Compare Performance of Synthetic v Real Talking Head in realistic situations • Good overall test of talking head • Lip-syncing, realism • No Quantitative Measure of Performance • APGV 04

Perceptual Analysis of Talking Heads: Forced Choice Testing: • Users Asked if Video is

Perceptual Analysis of Talking Heads: Forced Choice Testing: • Users Asked if Video is Real or Synthetic • • Big Prior Introduced • • Bored/Uninterested User No Quantitative Feedback for Model Improvement • APGV 04 Users look for artefacts Randomness Bias in User selection • • Only says if it looks realistic + lip sync is good What makes it real/synthetic?

Perceptual Analysis of Talking Heads: An New Mc. Gurk Test • Mc. Gurk Test

Perceptual Analysis of Talking Heads: An New Mc. Gurk Test • Mc. Gurk Test for Perceptual Analysis • Subject doesn’t develop a prior • Helps address strengths and weaknesses • Suggests improvements based on these • Compliments other tests APGV 04

Perceptual Analysis of Talking Heads: The Mc. Gurk Effect Mac. Donald and Mc. Gurk

Perceptual Analysis of Talking Heads: The Mc. Gurk Effect Mac. Donald and Mc. Gurk (1976): • Auditory Syllable Dubbed onto Videotape of Different Syllables Gives Perception of and Entirely Different Syllable, e. g. : • • • “Close Eyes – Illusion Vanishes” Raises Psychological Audio-Visual questions: • • APGV 04 Audio ‘Ba’ Visual ‘Ga’ Perception ‘Da’ How is Auditory and Visual Stimuli combined? Why combine when audio is enough?

Perceptual Analysis of Talking Heads: Some More Mc. Gurk Effect Examples APGV 04

Perceptual Analysis of Talking Heads: Some More Mc. Gurk Effect Examples APGV 04

Perceptual Analysis of Talking Heads Mc. Gurk Effect Examples (REAL) APGV 04

Perceptual Analysis of Talking Heads Mc. Gurk Effect Examples (REAL) APGV 04

Perceptual Analysis of Talking Heads: Mc. Gurk Effect Examples (ANSWERS) Tuple: Bent/Vest/Vent APGV 04

Perceptual Analysis of Talking Heads: Mc. Gurk Effect Examples (ANSWERS) Tuple: Bent/Vest/Vent APGV 04 Tuple: Mat/Dead/Gnat

Perceptual Analysis of Talking Heads Mc. Gurk Effect Examples (Synthetic) APGV 04

Perceptual Analysis of Talking Heads Mc. Gurk Effect Examples (Synthetic) APGV 04

Perceptual Analysis of Talking Heads: Mc. Gurk Effect Examples (ANSWERS) Synthetic Examples Tuple: Fame/Face/Feign

Perceptual Analysis of Talking Heads: Mc. Gurk Effect Examples (ANSWERS) Synthetic Examples Tuple: Fame/Face/Feign APGV 04 Tuple: Mat/Dead/Gnat

Perceptual Analysis of Talking Heads: Our Mc. Gurk Test Mc. Gurk Perceptual Evaluation Test:

Perceptual Analysis of Talking Heads: Our Mc. Gurk Test Mc. Gurk Perceptual Evaluation Test: • • • Mix Real and Synthetic tuples. What word do you perceive? Users asked to note anything differences • • • Best Viewing resolution • APGV 04 NO PRIORS as to real/synthetic forced choice User only asked about they hear/perceive Tested different resolutions (72 x 75, 36 x 289, 720 x 576 pixels)

Perceptual Analysis of Talking Heads: Our Mc. Gurk Experimental Procedure • • • APGV

Perceptual Analysis of Talking Heads: Our Mc. Gurk Experimental Procedure • • • APGV 04 Mix of Real and Synthetic Mc. Gurk Examples • Real examples are a control Users Presented with a series of 60 (30 real 30 Synthetic) random examples Users asked only to focus on the mouth area Two initial example “training” sequences (not in trial) Soundproofed booths with adjustable volume and artificial lighting Replay option for all example Users simply record the word they perceive Users asked three questions after viewing all clips • “Did you notice anything about the videos that you can comment on? ” • “Could you tell that some of the videos were computer generated? ” • “Did you use the replay button at all? ” 20 psychology undergrad test subjects (4 Male/16 female) with normal hearing/vision

Perceptual Analysis of Talking Heads: How is Our Mc. Gurk Test a Test •

Perceptual Analysis of Talking Heads: How is Our Mc. Gurk Test a Test • How is this a test? • • Correct Lip Synch = Mc. Gurk Effect Incorrect Lip Synch = Audio/Other • • Questions Assess Behaviour/Output • APGV 04 Audio should be dominant After test procedure participants asked whether they noticed anything unnatural?

Perceptual Analysis of Talking Heads: Results Four Types of Analysis of Results: • Standard

Perceptual Analysis of Talking Heads: Results Four Types of Analysis of Results: • Standard Mc. Gurk Response • From tuples form accepted audio and accepted Mc. Gurk response • Original Mc. Gurk observation • Enhanced Mc. Gurk Response • Assemble a List of All participants Mc. Gurk Reponses • Allows for greater variability in accents/articulation • Allows for greater analysis and Improvement of Head Models • Effects of Resolution on Mc. Gurk Effect • End of Test Questions Analysis • General overall response, qualitative analysis APGV 04

Perceptual Analysis of Talking Heads: Standard Mc. Gurk Response APGV 04

Perceptual Analysis of Talking Heads: Standard Mc. Gurk Response APGV 04

Perceptual Analysis of Talking Heads: Enhanced Mc. Gurk Response APGV 04

Perceptual Analysis of Talking Heads: Enhanced Mc. Gurk Response APGV 04

Perceptual Analysis of Talking Heads: Image Resolution APGV 04

Perceptual Analysis of Talking Heads: Image Resolution APGV 04

Perceptual Analysis of Talking Heads: End of Test Questions Results • • • APGV

Perceptual Analysis of Talking Heads: End of Test Questions Results • • • APGV 04 “Notice anything to comment on? ” Some audio didn’t match video “Could you tell some synthetic? ” No, 1 participant = some unnatural? “Did you use replay? ” Few = once, One = twice

Perceptual Analysis of Talking Heads: Overall Results Analysis • • • Realistic behaviour •

Perceptual Analysis of Talking Heads: Overall Results Analysis • • • Realistic behaviour • Most users were unaware of synthetic output • Good Synthesis of /F/, /D/, /S/, /A/ and /E/ Poor Synthesis of /V/ More Mc. Gurk effects in real output Points to some weakness in model • • • Some weak real and synthetic Mc. Gurk responses • Beige-Gaze-Deige -> 2 X Audio v Mc. Gurk • Mock-Dock-Knock -> 50: 50 Audio: Mc. Gurk Resolution has effect on real only • APGV 04 Due to overall lower synthetic Mc. Gurk response

Conclusions • Suggested a perceptual approach to analysis and development of a Talking Head

Conclusions • Suggested a perceptual approach to analysis and development of a Talking Head • • • APGV 04 Unbiased by prior forced choice making Insight into performance of algorithms Complements other tests

Perceptual Analysis of Talking Heads: Future Work • Talking Head • • • Other

Perceptual Analysis of Talking Heads: Future Work • Talking Head • • • Other perceptual tests • • • APGV 04 Full Emotion Performance Driven Animation 3 D Modelling Full 3 D appearance modelling Longer videos – Mc. Gurk sentences Real/Synthesised correct lip synch: Mc. Gurk = bad synch? Emotion – A Mc. Gurk emotion test?

Web Links • Paper Downloads www. cs. cf. ac. uk/user/D. P. Cosker/publications. html www.

Web Links • Paper Downloads www. cs. cf. ac. uk/user/D. P. Cosker/publications. html www. cs. cf. ac. uk/Dave/Publications. html • Mc. Gurk Video Clips and Mc. Gurk Test Software (Macromedia Director) www. cs. cf. ac. uk/user/D. P. Cosker/Mc. Gurk/ APGV 04