Creating a Speech Enabled Avatar from a Single

Creating a Speech Enabled Avatar from a Single Photograph Dmitri Bitouk Shree K. Nayar Columbia University

Speech Enabled Avatar Input photograph

Speech Enabled Avatar Input photograph Avatar

Speech Enabled Avatar Input photograph Avatar Applications: • mobile messaging and video conferencing • news reporting and information kiosks • novel user interfaces

Facial Motion Synthesis Challenges • Mapping phonemes to static mouth shapes produces unrealistic, jerky animations • Co-articulation: facial articulations can be dominated the preceding as well upcoming phonemes • Asynchrony: facial motion may precede the corresponding sound

Related Work • Avatars from video sequences Bregler et al 1997, Ezzat et al 2002, etc • 2 D Avatars from photographs Blanz et al 2003, Crazy. Talk. TM , Motion. Portrait. TM

Generic Facial Motion Model Prototype Surface Bitouk 2006 Deformed Surface - Facial motion parameters

Generic Facial Motion Model

Facial Motion Transfer Prototype Face Bitouk 2006 Novel Faces

Facial Motion Transfer Prototype Face Bitouk 2006 Novel Faces

Hidden Markov Models Phonemes: /B/, /K/, /AA/, /IY/, etc With lexical: /B/, /K/, /AA 0/, /AA 1/, /IY 0/, /IY 1/, etc stress Triphones: s 1 s 2 Facial motion parameters

Training Hidden Markov Models • Training set consists of motion capture data • Baum-Welch embedded re-estimation • Cluster triphone states to predict triphones not seen in the training set

Facial Motion Synthesis from Text-to-Speech Engine Speech Time-labeled phonemes Hidden Markov Models Facial Motion Parameters

Fitting the Prototype Model to an Image 2 D Prototype Face Photograph

Fitting the Prototype Model to an Image 2 D Prototype Face Photograph

Facial Motion Synthesis

Eye Motion Synthesis

Eyeball Texture Synthesis Eye Image Synthesized Eyeball Texture

Eye Motion Synthesis Eye Motion Geometry

Eye Motion and Blinking

Visual Text-to-Speech Synthesis

Visual Text-to-Speech Synthesis

Facial Motion Synthesis from Speech Recognition Time-labeled phonemes Hidden Markov Models Facial Motion Parameters

Facial Motion Synthesis from Speech

3 D Avatars Captured Stereo Image Mirror View Direct View Gluckman & Nayar, 2001

3 D Avatars Rectified Images Mirror View Direct View 3 D Model

3 D Avatars Point cloud engraved inside a glass cube Digital projector Nayar & Anand, 2007

3 D Avatars

Limitations and Future Work • Automatic facial feature detection • Synthesis of rigid head motion • Expressive speech • Web demo of our system will be available in early April www. cs. columbia. edu/CAVE/

The End
- Slides: 30