Creating a Speech Enabled Avatar from a Single
Creating a Speech Enabled Avatar from a Single Photograph Dmitri Bitouk Shree K. Nayar Columbia University
Speech Enabled Avatar Input photograph
Speech Enabled Avatar Input photograph Avatar
Speech Enabled Avatar Input photograph Avatar Applications: • mobile messaging and video conferencing • news reporting and information kiosks • novel user interfaces
Facial Motion Synthesis Challenges • Mapping phonemes to static mouth shapes produces unrealistic, jerky animations • Co-articulation: facial articulations can be dominated the preceding as well upcoming phonemes • Asynchrony: facial motion may precede the corresponding sound
Related Work • Avatars from video sequences Bregler et al 1997, Ezzat et al 2002, etc • 2 D Avatars from photographs Blanz et al 2003, Crazy. Talk. TM , Motion. Portrait. TM
Generic Facial Motion Model Prototype Surface Bitouk 2006 Deformed Surface - Facial motion parameters
Generic Facial Motion Model
Facial Motion Transfer Prototype Face Bitouk 2006 Novel Faces
Facial Motion Transfer Prototype Face Bitouk 2006 Novel Faces
Hidden Markov Models Phonemes: /B/, /K/, /AA/, /IY/, etc With lexical: /B/, /K/, /AA 0/, /AA 1/, /IY 0/, /IY 1/, etc stress Triphones: s 1 s 2 Facial motion parameters
Training Hidden Markov Models • Training set consists of motion capture data • Baum-Welch embedded re-estimation • Cluster triphone states to predict triphones not seen in the training set
Facial Motion Synthesis from Text-to-Speech Engine Speech Time-labeled phonemes Hidden Markov Models Facial Motion Parameters
Fitting the Prototype Model to an Image 2 D Prototype Face Photograph
Fitting the Prototype Model to an Image 2 D Prototype Face Photograph
Facial Motion Synthesis
Eye Motion Synthesis
Eyeball Texture Synthesis Eye Image Synthesized Eyeball Texture
Eye Motion Synthesis Eye Motion Geometry
Eye Motion and Blinking
Visual Text-to-Speech Synthesis
Visual Text-to-Speech Synthesis
Facial Motion Synthesis from Speech Recognition Time-labeled phonemes Hidden Markov Models Facial Motion Parameters
Facial Motion Synthesis from Speech
3 D Avatars Captured Stereo Image Mirror View Direct View Gluckman & Nayar, 2001
3 D Avatars Rectified Images Mirror View Direct View 3 D Model
3 D Avatars Point cloud engraved inside a glass cube Digital projector Nayar & Anand, 2007
3 D Avatars
Limitations and Future Work • Automatic facial feature detection • Synthesis of rigid head motion • Expressive speech • Web demo of our system will be available in early April www. cs. columbia. edu/CAVE/
The End
- Slides: 30