Towards Perceptually Realistic Talking Heads Models Methods and
- Slides: 35
Towards Perceptually Realistic Talking Heads: Models, Methods and Mc. Gurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer Science Susan Paddock and Simon Rushton Cardiff School of Psychology Cardiff University APGV 04
Context: A Talking Head • • • APGV 04 Development of a Video-Realistic Talking Head Animation from Continuous Speech Perceptual Analysis -> Realism
Contribution of this Paper: Perceptual Realism Test • • • APGV 04 Perceptual Analysis via Mc. Gurk Test Perceptual Test with no prior bias Used to improve talking head synthesis
Outline of Talk • • • APGV 04 Video Realistic Talking Head (Overview) Perceptual Analysis and Testing The Mc. Gurk Effect + Mc. Gurk Test Results : Implications of Mc. Gurk Conclusions + Future Work
Our Talking Head • • • Image based synthesis Continuous Speech Flexible framework – emotion, behaviour BASIC IDEA: • Train on input video and audio • • • Synthesise new video using only input audio • • APGV 04 Extracting only low level image and audio features No phonetic labelling Unseen utterances Speaker Independent
Hierarchical Facial Model • Active Appearance Models – Control of shape and texture usingle ‘appearance parameter’ • • • APGV 04 Based on Principal Component Analysis (PCA) Non-linear Hierarchical PCA (developed at Cardiff) Greater Separation of Variation High Degree of Control – Sub -Facial variation not orthogonal in standard PCA model Coupling of Speech Model (Cardiff Idea)
Building A Talking Head Initialisation For Each Video Frame Extract: • Shape – Key Landmark Points (Tracker Helps) • Textures – Colour Pixel Values Normalised to Shape • Speech Features – Mel-Cepstral, Linear Predictive Coding (LPC) APGV 04
Building A Talking Head - Tracking Semi Automated • Hand Place Few Frames • Build Interim Shape Model • Track Other Frames • Build Final Shape Model APGV 04
Building A Talking Head Learning/Model Building Active Appearance Model (AAM)-> Shape (PCA) and Texture (PCA) Speech/Appearance Model (SAAM NEW) -> Speech (PCA) and AAM Nonlinear PCA: • Gaussian Mixture Model (GMM) Model of Dynamics: • Hidden Markov Model (HMM) APGV 04
Building A Talking Head Synthesis + Reconstruction Input Speech -> Extract Speech Features + Find Best Clusters Bottom up reconstruction: Mouth Driven APGV 04
Talking Head Examples APGV 04
Talking Head Example: Independent Speaker APGV 04
Perceptual Analysis of Talking Heads Current Talking Head Analysis Methods • • APGV 04 Subjective Evaluation Analyse and Compare Trajectories Improved Perception in Noisy environments Forced Choice Testing
Perceptual Analysis of Talking Heads Subject and Trajectory Evaluation • Subjective Evaluation • • Analyse and Compare Trajectories • • • APGV 04 Does it “look good”? No formative comparison No feedback to improve model Ground truth quantitative assessment Comparison to “seen” data No perceptual quality measurement
Perceptual Analysis of Talking Heads: Noisy Environment Evaluation • Noisy Environment Evaluation Perceptual Evaluation • Compare Performance of Synthetic v Real Talking Head in realistic situations • Good overall test of talking head • Lip-syncing, realism • No Quantitative Measure of Performance • APGV 04
Perceptual Analysis of Talking Heads: Forced Choice Testing: • Users Asked if Video is Real or Synthetic • • Big Prior Introduced • • Bored/Uninterested User No Quantitative Feedback for Model Improvement • APGV 04 Users look for artefacts Randomness Bias in User selection • • Only says if it looks realistic + lip sync is good What makes it real/synthetic?
Perceptual Analysis of Talking Heads: An New Mc. Gurk Test • Mc. Gurk Test for Perceptual Analysis • Subject doesn’t develop a prior • Helps address strengths and weaknesses • Suggests improvements based on these • Compliments other tests APGV 04
Perceptual Analysis of Talking Heads: The Mc. Gurk Effect Mac. Donald and Mc. Gurk (1976): • Auditory Syllable Dubbed onto Videotape of Different Syllables Gives Perception of and Entirely Different Syllable, e. g. : • • • “Close Eyes – Illusion Vanishes” Raises Psychological Audio-Visual questions: • • APGV 04 Audio ‘Ba’ Visual ‘Ga’ Perception ‘Da’ How is Auditory and Visual Stimuli combined? Why combine when audio is enough?
Perceptual Analysis of Talking Heads: Some More Mc. Gurk Effect Examples APGV 04
Perceptual Analysis of Talking Heads Mc. Gurk Effect Examples (REAL) APGV 04
Perceptual Analysis of Talking Heads: Mc. Gurk Effect Examples (ANSWERS) Tuple: Bent/Vest/Vent APGV 04 Tuple: Mat/Dead/Gnat
Perceptual Analysis of Talking Heads Mc. Gurk Effect Examples (Synthetic) APGV 04
Perceptual Analysis of Talking Heads: Mc. Gurk Effect Examples (ANSWERS) Synthetic Examples Tuple: Fame/Face/Feign APGV 04 Tuple: Mat/Dead/Gnat
Perceptual Analysis of Talking Heads: Our Mc. Gurk Test Mc. Gurk Perceptual Evaluation Test: • • • Mix Real and Synthetic tuples. What word do you perceive? Users asked to note anything differences • • • Best Viewing resolution • APGV 04 NO PRIORS as to real/synthetic forced choice User only asked about they hear/perceive Tested different resolutions (72 x 75, 36 x 289, 720 x 576 pixels)
Perceptual Analysis of Talking Heads: Our Mc. Gurk Experimental Procedure • • • APGV 04 Mix of Real and Synthetic Mc. Gurk Examples • Real examples are a control Users Presented with a series of 60 (30 real 30 Synthetic) random examples Users asked only to focus on the mouth area Two initial example “training” sequences (not in trial) Soundproofed booths with adjustable volume and artificial lighting Replay option for all example Users simply record the word they perceive Users asked three questions after viewing all clips • “Did you notice anything about the videos that you can comment on? ” • “Could you tell that some of the videos were computer generated? ” • “Did you use the replay button at all? ” 20 psychology undergrad test subjects (4 Male/16 female) with normal hearing/vision
Perceptual Analysis of Talking Heads: How is Our Mc. Gurk Test a Test • How is this a test? • • Correct Lip Synch = Mc. Gurk Effect Incorrect Lip Synch = Audio/Other • • Questions Assess Behaviour/Output • APGV 04 Audio should be dominant After test procedure participants asked whether they noticed anything unnatural?
Perceptual Analysis of Talking Heads: Results Four Types of Analysis of Results: • Standard Mc. Gurk Response • From tuples form accepted audio and accepted Mc. Gurk response • Original Mc. Gurk observation • Enhanced Mc. Gurk Response • Assemble a List of All participants Mc. Gurk Reponses • Allows for greater variability in accents/articulation • Allows for greater analysis and Improvement of Head Models • Effects of Resolution on Mc. Gurk Effect • End of Test Questions Analysis • General overall response, qualitative analysis APGV 04
Perceptual Analysis of Talking Heads: Standard Mc. Gurk Response APGV 04
Perceptual Analysis of Talking Heads: Enhanced Mc. Gurk Response APGV 04
Perceptual Analysis of Talking Heads: Image Resolution APGV 04
Perceptual Analysis of Talking Heads: End of Test Questions Results • • • APGV 04 “Notice anything to comment on? ” Some audio didn’t match video “Could you tell some synthetic? ” No, 1 participant = some unnatural? “Did you use replay? ” Few = once, One = twice
Perceptual Analysis of Talking Heads: Overall Results Analysis • • • Realistic behaviour • Most users were unaware of synthetic output • Good Synthesis of /F/, /D/, /S/, /A/ and /E/ Poor Synthesis of /V/ More Mc. Gurk effects in real output Points to some weakness in model • • • Some weak real and synthetic Mc. Gurk responses • Beige-Gaze-Deige -> 2 X Audio v Mc. Gurk • Mock-Dock-Knock -> 50: 50 Audio: Mc. Gurk Resolution has effect on real only • APGV 04 Due to overall lower synthetic Mc. Gurk response
Conclusions • Suggested a perceptual approach to analysis and development of a Talking Head • • • APGV 04 Unbiased by prior forced choice making Insight into performance of algorithms Complements other tests
Perceptual Analysis of Talking Heads: Future Work • Talking Head • • • Other perceptual tests • • • APGV 04 Full Emotion Performance Driven Animation 3 D Modelling Full 3 D appearance modelling Longer videos – Mc. Gurk sentences Real/Synthesised correct lip synch: Mc. Gurk = bad synch? Emotion – A Mc. Gurk emotion test?
Web Links • Paper Downloads www. cs. cf. ac. uk/user/D. P. Cosker/publications. html www. cs. cf. ac. uk/Dave/Publications. html • Mc. Gurk Video Clips and Mc. Gurk Test Software (Macromedia Director) www. cs. cf. ac. uk/user/D. P. Cosker/Mc. Gurk/ APGV 04
- What are modals and semi modals
- Business analytics methods models and decisions
- Decision tree business analytics
- Scope of business analytics
- Linear programming models: graphical and computer methods
- The engineering design of systems: models and methods
- An introduction to variational methods for graphical models
- Inlay wax pattern fabrication
- Cc: all deans and heads
- A pilot sets out from an airport and heads in the direction
- Easter island and stonehenge on opposite sides of the earth
- Realistic job preview advantages and disadvantages
- Realistic job preview pros and cons
- Realistic and proportionate
- Income tax slabs
- The daffodils nodded their yellow heads at the walkers
- Hydrophilic heads
- Overhead water tank plumbing diagram
- The linear motion of the twist drill is called as
- Circumference of ireland
- Minotaur greek mythology
- Heads down digitizing
- Feed enzyme
- Running heads in apa 7
- Hydrophilic heads
- 3 types of lung receptors
- A motorboat heads due east at 16 m/s
- Snake chief
- Wheel and axle simple machines
- Alliteration and hyperbole examples
- Heads up tackling circuit
- Posterior compartment of thigh
- Coconut man, moonheads and pea
- Sample technical assistance for teachers deped
- The silver thorn of bloody rose meaning
- Heads up tackling