Speech Synthesis Markup Language Aim at Extension Dr

Brief Introduction to Evolution of SSML o o o The original SSML (not W

The original SSML o o Mark phrase boundaries Emphasis words Specify pronunciations Include other

STML o o o Developed by Edinburgh and Bell Labs Based on the original

JSML o o o Developed by Sun XML based Include n n n Elements

SABLE o o o Developed by Edinburgh and Bell Labs Based on STML and

W 3 C SSML o Key design criteria n n n Consistency Interoperability Generality

What we want from markup language o o o Controlling Sharing Extended to multimedia

Which level we should focus o o o Text analysis module Prosody module Acoustic

Data Structure 1 Data Structure 2 Sharing Sys 2 Text-analysis Prosody-analysis acoustic SSML Text-analysis

Text level for Mandarin o o Word boundary Pronunciation with tone POS Dialect?

Prosody level for Mandarin o o Tone sandhi Rhythm ?

Extensions to expressive synthesis o o Emotion and Style Others National Laboratory of Pattern

Current elements related to prosody and style in SSML o o 3. 2. 1

Emotion and Style o Emotion n o Anger, happy, surprise, sad, fear, … Depend

Personalized Voice o Element：voice n n n o o “gender”： “age”： “name”： “variant”： sample：

Extension? o To make it more expressive n n Background music VTTS o n

o Element: <Structure> n n o Level: 0 -. . ; paragraph, phrase, POS:

Slides: 19

Download presentation

Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Brief Introduction to Evolution of SSML o o o The original SSML (not W 3 C SSML) STML JSML SABLE W 3 C SSML … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

The original SSML o o Mark phrase boundaries Emphasis words Specify pronunciations Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

STML o o o Developed by Edinburgh and Bell Labs Based on the original SSML Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

JSML o o o Developed by Sun XML based Include n n n Elements to mark the paragraphs and sentences Elements to control the pronunciations Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

SABLE o o o Developed by Edinburgh and Bell Labs Based on STML and JSML The stated aims n Synthesizer control o o n n Text structure Speech pronunciation Multilinguality Easy of Use Portable Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

W 3 C SSML o Key design criteria n n n Consistency Interoperability Generality Internationalization Generation and Readability Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

What we want from markup language o o o Controlling Sharing Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Which level we should focus o o o Text analysis module Prosody module Acoustic module

Data Structure 1 Data Structure 2 Sharing Sys 2 Text-analysis Prosody-analysis acoustic SSML Text-analysis SSML Sys 1 Prosody-analysis acoustic National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Text level for Mandarin o o Word boundary Pronunciation with tone POS Dialect?

Prosody level for Mandarin o o Tone sandhi Rhythm ?

Extensions to expressive synthesis o o Emotion and Style Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Current elements related to prosody and style in SSML o o 3. 2. 1 3. 2. 2 3. 2. 3 3. 2. 4 "voice" Element "emphasis" Element "break" Element "prosody" Element

Emotion and Style o Emotion n o Anger, happy, surprise, sad, fear, … Depend on speaker’s psychological and physical states Local effects on prosody Style n n n News, comments, … Depend on semantics of sentences Global effects on prosody

Personalized Voice o Element：voice n n n o o “gender”： “age”： “name”： “variant”： sample：他说：<voice gender=”male”>“什么意思？ ”</voice> 她回答：<voice gender=”female”>“没什么意思。”</voice>

Extension? o To make it more expressive n n Background music VTTS o n n Combined with talking head and some other media information … We only can see the element “mark“ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Thanks!

o Element: <Structure> n n o Level: 0 -. . ; paragraph, phrase, POS: <Structure: level=paragraph> n <Structure: level=sentence> o <Structure: level=phrase> n <Structure: level=word>