SSML Extensions Aimed To Improve Asian Language TTS

  • Slides: 8
Download presentation
SSML Extensions Aimed To Improve Asian Language TTS Rendering --W 3 C Workshop on

SSML Extensions Aimed To Improve Asian Language TTS Rendering --W 3 C Workshop on Internationalizing the Speech Synthesis Markup Language, Beijing, China, November 2 -3 2005 Jilei Tian*, Xia Wang+, Jani Nurminen* Multimedia Technologies Laboratory *Nokia Research Center, Finland +Nokia Research Center, China Company Confidential 1 © 2005 Nokia

Outline 1. SSML Overview And Status 2. Peculiarities In Asian Languages 3. Proposal To

Outline 1. SSML Overview And Status 2. Peculiarities In Asian Languages 3. Proposal To SSML Extension 4. Examples With Proposed SSML Company Confidential 2 © 2005 Nokia

SSML Overview And Status • Speech Synthesis Markup Language (SSML) is a standard way

SSML Overview And Status • Speech Synthesis Markup Language (SSML) is a standard way of producing content to be spoken by a speech synthesis system • Current SSML mainly designed/focused on English and European languages • SSML should be improved and extended for rendering Asian languages including Chinese language, etc. • Position on TTS development • Core engine: formant based and concatenative acoustic unit based TTS systems; • Structure: text processing, prosody processing and acoustic processing modules • Languages: Mandarin Chinese, English, …… • Interface between modules: • Input to text processing module: SSML • Output from acoustic module: waveform • Rest of interface: revised ECESS XML Company Confidential 3 © 2005 Nokia

Peculiarities in Asian Languages • Asian language • Tonal; • Syllabic; • No word

Peculiarities in Asian Languages • Asian language • Tonal; • Syllabic; • No word marker/break; • Features • Tone: supra-segmental features and controlled in the supra-segmental layer, or within the basic unit, e. g. syllable or final level; • Tone Sandhi: tone might change inside a word due to the context, differed from lexical tone. Word could be perceived as a totally differerent word if tone sandhi is not considered. • Acoustic unit: syllables or sub-syllabic structures like initials and finals, rather than phonemes for western languages. More flexibility in SSML to better support different languages. • Word segmentation: no explicit marker, essential part of the text processing. • Multi-linguility in SSML: due to loan words, URLs, email, etc. • Written language vs. spoken language; dialect vs. accent (personalization) Company Confidential 4 © 2005 Nokia

Proposal to SSML Extension (1) • SYLLABLE element <syllable> • Natural to use syllables

Proposal to SSML Extension (1) • SYLLABLE element • Natural to use syllables or tonal syllables as the basic units of a TTS system for the language, such as Mandarin Chinese; • Orthorgraphic plus romanized form or transliteration would be a better representation for SSML. For example: Han. Yu pinyin in Mandarin Chinese; Jyutping for Cantonese; other transliteration system for Thai, Vietnamese, Hindi, Urdu, etc. • WORD element • Important in languages that don't have word boundaries (e. g. Thai, Chinese, Vietnamese); • Crucial for tone sandhi since many tone changes happen within a word; • WORD element to enhance the word segmentation and tone sandhi, in case that automatic word segmentation does not work or user forces the system to take a certain word segment. Its attribute is the segmented word. • could be optional and used for determining the pronunciation, tones and stress level of given word. • can be used for defining the break strength at the boundary (character boundary, word boundary, prosodic phrase boundary, sentence boundary, etc). Company Confidential 5 © 2005 Nokia

Proposal to SSML Extension (2) • PROSODY element <prosody> update • Pitch contours play

Proposal to SSML Extension (2) • PROSODY element update • Pitch contours play a very important role for tonal languages. Different tone leads to completely different meanings. • Propose to enhance prosodic features, particularly on pitch. Prosody features are given in (time, value) format. This approach gives the possiblity to cover any prosodic needs. • As shown in example SSML for Mandarin Chinese, used to define the given character. and are introduced to describe prosodic features, pitch and volume, in the (time, value) format in order to have a better representation capability for prosodic features. • Practical issues • Absolute vs. relative frequency in pitch; • define the tone contours out of the word level and to use those definitions in the word level; no need to specify the frequencies for each word; • For parameter-based synthesis, tones are defined as another feature in the suprasegmental level, like duration and stress; • Multilingual extensions to SSML • Mixed multilingual texts (IBM中国研究中心, Mp 3 播放器); pronunciations for loan words; URLs, email address, etc. • Abbreviations: asap, cu • Script language - Spoken language – spoken area (e. g. English – US, UK, Australia, etc. , or Chinese – Cantonese – HK; or Spanish – US) Company Confidential 6 © 2005 Nokia

Examples With Proposed SSML xia 4 Company Confidential 7 © 2005 Nokia wu 32 ………………………………

Thank you! Questions and Comments Company Confidential 8 © 2005 Nokia

Thank you! Questions and Comments Company Confidential 8 © 2005 Nokia