Why a Rising Tone is Falling in Mandarin

  • Slides: 35
Download presentation
Why a Rising Tone is Falling in Mandarin Sentences Word Accents and Tones in

Why a Rising Tone is Falling in Mandarin Sentences Word Accents and Tones in Sentence Perspective: A symposium in conjunction with the 60 th birthday of Professor Gösta Bruce Chilin Shih University of Illinois at Urbana-Champaign January 10, 2007 Lund, Sweden

Generated by Words. Eye from text description. Under development at Semantic. Light, Inc.

Generated by Words. Eye from text description. Under development at Semantic. Light, Inc.

Outline • What we know – Chinese is a lexical tone language. • Surprise!

Outline • What we know – Chinese is a lexical tone language. • Surprise! – Tones in sentences may deviate considerably from their lexical specifications. • Research question – Explain the difference between lexical tones and the observed sentence production. • Implication – A simulation model linking phonology to phonetics.

Chinese Lexical Tones Tone shapes differentiate lexical meaning. Ma 1: mother Ma 2: hemp

Chinese Lexical Tones Tone shapes differentiate lexical meaning. Ma 1: mother Ma 2: hemp Ma 3: horse Ma 4: to scold

Chinese Sentences Ma 1 -ma 0 ma 4 ma 3. Mother scolds the horse.

Chinese Sentences Ma 1 -ma 0 ma 4 ma 3. Mother scolds the horse. Ma 3 ma 4 ma 1 -ma 0. The horse scolds mother.

Chinese Intonation Types (Data from Jiahong. Yuan) Statement Li 3 bai 4 wu 3

Chinese Intonation Types (Data from Jiahong. Yuan) Statement Li 3 bai 4 wu 3 Luo 2 yan 4 yao 4 mai 3 lu 4. On Friday Luoyan wants to buy a deer. Question

Classification of Tone Shapes Tone 1 High level Tone 2 Rising Tone 3 Low

Classification of Tone Shapes Tone 1 High level Tone 2 Rising Tone 3 Low falling Tone 4 High falling

Cause of Tonal Distortion • Ease of articulatory effort • Balancing articulatory effort and

Cause of Tonal Distortion • Ease of articulatory effort • Balancing articulatory effort and communication need

Physiological constraints: Communication errors: • When you say what you think you are saying:

Physiological constraints: Communication errors: • When you say what you think you are saying: • When you are not saying want you think you are saying:

Ease of Articulatory Effort—I

Ease of Articulatory Effort—I

Ease of Articulatory Effort—II

Ease of Articulatory Effort—II

Ease of Articulatory Effort—III

Ease of Articulatory Effort—III

Production of Rising and Falling Tones

Production of Rising and Falling Tones

Severe Tonal Distortion—I

Severe Tonal Distortion—I

People Talk Nearly As Fast As Possible

People Talk Nearly As Fast As Possible

Severe Tonal Distortion—II

Severe Tonal Distortion—II

Local distortion is predictable from global optimization

Local distortion is predictable from global optimization

A Racing Game

A Racing Game

Adjusting the Best Path

Adjusting the Best Path

Best Path in Tonal Production 1. 0 0. 5 1. 0 0. 0 1.

Best Path in Tonal Production 1. 0 0. 5 1. 0 0. 0 1. 0

Stem-ML ØThe prosodic modeling is based on Stem-ML (Soft Template Mark-up Language). ØStem-ML consists

Stem-ML ØThe prosodic modeling is based on Stem-ML (Soft Template Mark-up Language). ØStem-ML consists of a set of mathematically defined tags with value attributes. For example: Tone prosodic strength ØAllowing user-defined accent shapes, phrase curves, and other speaker specific parameters. Kochanski and Shih (2003), Prosody modeling with soft templates, Speech Communication V. 39. Shih (in preparation), Prosody Learning and Generation, Springer.

Basic Assumptions • Pre-planning. • Balance articulatory effort and communication needs (Lindblom, Ohala). •

Basic Assumptions • Pre-planning. • Balance articulatory effort and communication needs (Lindblom, Ohala). • A dynamical model for the muscles that control f 0 (Hill).

We further propose: • Speaker shifts weights dynamically as they speak. • This is

We further propose: • Speaker shifts weights dynamically as they speak. • This is the prosodic strength, which reflects the articulatory effort.

Linking Phonology and Phonetics • A model is a sequence of templates (i. e.

Linking Phonology and Phonetics • A model is a sequence of templates (i. e. points representing tone/accent shapes). The templates encodes phonological information. • For tone languages, there is one template per tone. Templates are stretched to fit duration. • Each template has a strength. The strength value determines phonetic variation.

Representation ØSurface F 0 contours are coded as a set of Template strength T

Representation ØSurface F 0 contours are coded as a set of Template strength T 11. 0 T 3 0. 3 T 4 1. 2 T 5 0. 8 T 21. 0 T 1 0. 5 ØGeneration: Template strength F 0 ØLearning: Template, F 0 Template strength

Modeling Math (Credit to Greg Kochanski) “Effort” is the muscle tension (~frequency) at time

Modeling Math (Credit to Greg Kochanski) “Effort” is the muscle tension (~frequency) at time t. Each target encodes some linguistic information, ri is the error of the ith target, and si is its importance. “Error” y is the ith pitch target and a bar denotes an average over a target.

Representing F 0 As Tone Strength

Representing F 0 As Tone Strength

Simulation of Tonal Production—I

Simulation of Tonal Production—I

Simulation of Tonal Production—II

Simulation of Tonal Production—II

Model Fits to Mandarin Chinese 0. 61 free parameters per syllable, 13 Hz RMS

Model Fits to Mandarin Chinese 0. 61 free parameters per syllable, 13 Hz RMS error.

Works for English The highest f 0 is on a weak, unaccented word. would

Works for English The highest f 0 is on a weak, unaccented word. would I like Uhm A flight to Seattle from Albuquerque

Muscle Dynamics Interpolation

Muscle Dynamics Interpolation

Discourse Functions • • • Topic initialization Discourse structure Phrasing Emphasis New vs. old

Discourse Functions • • • Topic initialization Discourse structure Phrasing Emphasis New vs. old information Other communicative means

How Do They Fit Together?

How Do They Fit Together?

Conclusion • Speech is a communication system. Speakers balance articulatory effort and communication needs.

Conclusion • Speech is a communication system. Speakers balance articulatory effort and communication needs. • We need a representation that encodes – Accent template – Articulatory effort – Emotional State • We present a computational simulation model that generate surface phonetic variations from this representation.