Centre of Research and Technology Hellas INFORMATICS TELEMATICS

  • Slides: 32
Download presentation
Centre of Research and Technology Hellas INFORMATICS & TELEMATICS INSTITUTE Artificial Intelligence & Information

Centre of Research and Technology Hellas INFORMATICS & TELEMATICS INSTITUTE Artificial Intelligence & Information Analysis Group (AIIA) Worlds Studio 2 nd Review Paris 26 -27/11/2002

OUTLINE • Ongoing R & D activities • CERTH WORLDS STUDIO MODEL • Development

OUTLINE • Ongoing R & D activities • CERTH WORLDS STUDIO MODEL • Development issues – MAYA plug-in to export Facial Animation compliant to MPEG-4; – Text driven dialog management, text-to-speech synthesis; Creation of audiovisual content compliant to MPEG-4; • Demonstrations • Research issues • Problems/Future work

Ongoing R&D activities § MAYA plug-in to export § MPEG-4 compliant face content §

Ongoing R&D activities § MAYA plug-in to export § MPEG-4 compliant face content § VRML files, and § Mesh face model § MPEG-4 player where mesh face models could be imported and their animation can be displayed.

Ongoing R&D activities § Statistical facial video analysis and modeling § Interactive/automatic face wireframe

Ongoing R&D activities § Statistical facial video analysis and modeling § Interactive/automatic face wireframe placement and derivation of FDPs. § Statistical facial grid analysis for expressions and visemes § Statistical texture analysis § Tools for deploying talking head § Speech synthesis (visemes, synthetic audio output) § Interactive talking heads/Dialog management § Tourist data/knowledge base for Toulouse

CERTH WORLDS STUDIO MODEL COPYRIGHT PROTECTION API RAPID PROTOTYPING VIDEO MAYA TEXT PLUG-IN FACIAL

CERTH WORLDS STUDIO MODEL COPYRIGHT PROTECTION API RAPID PROTOTYPING VIDEO MAYA TEXT PLUG-IN FACIAL VIDEO MPEG 4 SPEECH DIALOG MANAGER ANALYSIS API STREAM API KNOWLEDGE API MPEG 4 PLAYER TALKING HEAD

Facial Video Analysis Face Tracking Facial MPEG 4 Grid Placement Statistical Grid Analysis Statistical

Facial Video Analysis Face Tracking Facial MPEG 4 Grid Placement Statistical Grid Analysis Statistical Texture Analysis Face Detection VIDEO FDPs Statistical FAP DB Expressions MPEG 4 Animated Face Model Statistical Texture DB Expressions Novel MPEG 4 Animator Standard MPEG 4 Player Talking head

MAYA/IPR/Prototyping IPR API VRML Converter Volume SCULPTOR VRML to MPEG 4 Converter FDP FAP

MAYA/IPR/Prototyping IPR API VRML Converter Volume SCULPTOR VRML to MPEG 4 Converter FDP FAP FDP MPEG 4 Animated Face Model MAYA / MPEG 4 Plug-In VRML file VRML Player MAYA Environment

Worlds Studio - Speech Synthesis Viseme Synthesis MPEG 4 Video MPEG 4 Audio WAV

Worlds Studio - Speech Synthesis Viseme Synthesis MPEG 4 Video MPEG 4 Audio WAV Text Keyboard Viseme labels Speech Synthesis API text string TEXT / ALICEBOT API Reply Query ALICEBOT Engine Standard MPEG 4 Player TOURIST knowledge base

Development issues § MAYA plug-in able to § export MPEG-4 compliant face content §

Development issues § MAYA plug-in able to § export MPEG-4 compliant face content § Animation (FAP file & VRML) § Face model (FDP file/mesh e & VRML) § export HANIM/MPEG-4 compliant body animation (VRML) § FAP file generation – The FDPs of the Maya model are predefined by the programmer. – Their displacement is tracked along a predefined time interval. – FDPs concerning hair, eye-balls, tongue, teeth have not included in the animation.

MAYA plug-in FAP file animating an arbitrary face model (through an MPEG 4 -player)

MAYA plug-in FAP file animating an arbitrary face model (through an MPEG 4 -player)

MAYA plug-in VRML file of the face surface

MAYA plug-in VRML file of the face surface

MAYA plugin Mesh face model & animation file

MAYA plugin Mesh face model & animation file

Development issues • • WEB-BASED CONVERSATIONAL AGENT Aim: To add a natural language interface

Development issues • • WEB-BASED CONVERSATIONAL AGENT Aim: To add a natural language interface based on multimedia to web pages Features: – – – User text input in natural language Audio (-video) response Web navigation according to user demands

WEB-based Conversational agent • Natural Language Interface – ALICE Bot module • • using

WEB-based Conversational agent • Natural Language Interface – ALICE Bot module • • using AIML set (XML) Multimedia Interface (Text to Speech module) – using MS Speech SDK 5. 1 • • • SAPI runtime (high level interface) TTS engine (low level operations) A Plug-in for web browsers – a plug-in for Netscape 4. x on MS Windows using • • Netscape’s Plug-In SDK Netscape Live. Connect to connect to the JVM and the Javascript interpreter of the web browser

WEB-based Conversational agent Interface added to Toulouse’s official web page: – – – a

WEB-based Conversational agent Interface added to Toulouse’s official web page: – – – a text field for user input on a separate window pages about Toulouse on the main window audio speech output audio output stored as a WAV-file visual speech content stored as an MPEG-4 FAP-file

WEB-based Conversational agent

WEB-based Conversational agent

Research issues • Face analysis – FDPs extraction with less human intervention • Grids

Research issues • Face analysis – FDPs extraction with less human intervention • Grids • Wireframes • Statistical analysis of FDPs and FAPs – To assist content creation (synthesis) – To control quality by defining new interpolation schemes (deformation, animation, coding) • Statistical analysis of texture • Quality assessment

Physics-Based Grid Adaptation on Face § Our goal is to adapt the mesh on

Physics-Based Grid Adaptation on Face § Our goal is to adapt the mesh on a human face § We assume that the mesh is deformed only on Z axis § It depends only on image luminance § The mesh is deformed on faces

Model Fitting on Face Images § ICP for fitting a face model to face

Model Fitting on Face Images § ICP for fitting a face model to face images using already defined points. § Mass-Spring Models to fit the face model on the face image

I(terative) C(losest) P(oint) algorithm § ICP is based on the Closest Set of Points

I(terative) C(losest) P(oint) algorithm § ICP is based on the Closest Set of Points § Closest Set of Points leads to Quaternions § Quaternion § an easily handled vector § the basis of the ICP transformations § similar to the rotation and translation matrices § Convergence § Monotonically to the nearest local minimum § rapid during the first few iterations § Global solution depends on the initial parameters

Mass-Spring Models § FEM restricted models § Simulates models as masses connected with springs

Mass-Spring Models § FEM restricted models § Simulates models as masses connected with springs § Physics based simulation four masses connected among themselves with uniform springs

Examples § Randomly initial positioning of the face model § Interactive definition of points

Examples § Randomly initial positioning of the face model § Interactive definition of points on the face image

Examples § Fit of the model by applying the ICP algorithm § Fit of

Examples § Fit of the model by applying the ICP algorithm § Fit of the model by applying the Mass. Spring Model

Facial Feature Extraction Based on Luminance Processing § Calculation of the sum of the

Facial Feature Extraction Based on Luminance Processing § Calculation of the sum of the luminance of the image along the columns and rows. § Estimation of these values occurring on images of the AR database. § Localization of the face and its features on the image, according to the local maxima and minima of the luminance. Values of luminance estimated along the x-axis Values of luminance estimated along the y-axis

Facial Feature Extraction Based on Luminance Processing § Calculation of the sum of the

Facial Feature Extraction Based on Luminance Processing § Calculation of the sum of the luminance of the image along the columns and rows. § Estimation of these values occurring on images of the AR database. § Localization of the face and its features on the image, according to the local maxima and minima of the luminance. Values of luminance estimated along the x-axis Values of luminance estimated along the y-axis

Face Feature Extraction Based on luminance Localization of the face’s ground truth and the

Face Feature Extraction Based on luminance Localization of the face’s ground truth and the areas of the eye-brows, eyes, nose and mouth.

Corner Detection in Face Images § Use of a corner-detection algorithm to localize facial

Corner Detection in Face Images § Use of a corner-detection algorithm to localize facial feature points.

Corner Detection in Face images § Selection of the facial feature points with respect

Corner Detection in Face images § Selection of the facial feature points with respect to the results of the profiles of the luminance on the rows and columns of the image.

Text To Speech (TTS) Synthesis Coding methods § (enabling modification to units) § §

Text To Speech (TTS) Synthesis Coding methods § (enabling modification to units) § § >articulatory Ø computational biomechanical models of speech production >rule-based (models units & transitions) Ø target positions for units, smoothing in-between • formant synthesis (highly simplified source-filter model) >concatenative (storage of units & transitions) Ø corpus-driven unit-selection • production (human speech production) – LPC (robotic, degrade at modifications) – RELP (Residual Excited LP) – formant synthesis • phenomenological – H/S (Harmonic/Stochastic) – TD-PSOLA (Time-Domain Pitch-Synchronous. Over. Lap-Add) – MBROLA – HNM (Harmonic Plus Noise Model) • no coding (no modification - huge corpus)

MPEG-4 FBA players: Myths and reality (1) • Publicly available MPEG-4 FBA players (IST

MPEG-4 FBA players: Myths and reality (1) • Publicly available MPEG-4 FBA players (IST Momusys or FAE demo) can read only FAP files. • Some players cannot interpret high-level FAPs (expressions, visemes) • Yet another player is required in case one needs to download a proprietary model – Choices: • read FDP files & mesh face models • read Facial Animation Table information

MPEG-4 FBA players: Myths and reality (2) • No mechanism to check in the

MPEG-4 FBA players: Myths and reality (2) • No mechanism to check in the content is compliant to the standard. • Need for having a reference player to test compliance. • Need to progressively move toward MPEG-4 Systems approach from the so-called MPEG-4 Audio-visual approach and to open-systems development.

Future work • Study of tools/issues for copyright protection of 3 D models/animation •

Future work • Study of tools/issues for copyright protection of 3 D models/animation • Business model/issues • Framework/interface issues – audio+FAP content MPEG-4 stream – application: interactive MPEG-4 talking head – possible extension to a speech interface using speech recognition – advanced discourse management.