Meaningful Intonational Variation 1282020 1 Today z Assigning

  • Slides: 28
Download presentation
Meaningful Intonational Variation 12/8/2020 1

Meaningful Intonational Variation 12/8/2020 1

Today z Assigning variation for TTS, CTS y. Contours x. Accent x. Phrasing y.

Today z Assigning variation for TTS, CTS y. Contours x. Accent x. Phrasing y. Pitch Range y. Amplitude and timing 12/8/2020 2

TTS Production Pipeline z. Orthographic input: Dr. Smith lives on Elm Dr. z. Text

TTS Production Pipeline z. Orthographic input: Dr. Smith lives on Elm Dr. z. Text normalization: abbreviation expansion… z. Pronunciation modeling: POS id, WS disambiguation z. Intonation assignment: parsing, POS id, robust semantics… z. Phonetic/phonological realization: phonological parsing, phonetic analysis z. Unit selection: acoustic analysis 12/8/2020 3

Intonation Assignment: Phrasing z Traditional: hand-built rules y. Punctuation 234 -5682 y. Context/function word:

Intonation Assignment: Phrasing z Traditional: hand-built rules y. Punctuation 234 -5682 y. Context/function word: no breaks after function word He went to dinner y. Parse? She favors the nuts and bolts approach z Current: statistical analysis of large labeled corpus y. Punctuation, pos window, utt length, … 12/8/2020 4

Functions of Phrasing z Disambiguates syntactic constructions, e. g. PP attachment: y. S: You

Functions of Phrasing z Disambiguates syntactic constructions, e. g. PP attachment: y. S: You should buy the ticket with the discount coupon. z Disambiguates scope ambiguities, e. g. Negation: y. S: You aren’t booked through Rome because of the fare. z Or modifier scope: y. S: This fare is restricted to retired politicians and civil servants. 12/8/2020 5

Intonation Assignment: Accent z Hand-built rules y. Function/content distinction He went out the back

Intonation Assignment: Accent z Hand-built rules y. Function/content distinction He went out the back door/He threw out the trash y. Complex nominals: x. Main Street/Park Avenue xcity hall parking lot z Statistical procedures trained on large corpora z Contrastive stress, given/new distinction? 12/8/2020 6

Functions of Pitch Accent z Given/new information y. S: Do you need a return

Functions of Pitch Accent z Given/new information y. S: Do you need a return ticket. y. U: No, thanks, I don’t need a return. z Contrast (narrow focus) y. U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt, …) z Disambiguation of discourse markers y. S: Now let me get you the train information. y. U: Okay (thanks) vs. Okay…. (but I really want…) 12/8/2020 7

Intonation Assignment: Contours z Simple rules y‘. ’ = declarative contour y‘? ’ =

Intonation Assignment: Contours z Simple rules y‘. ’ = declarative contour y‘? ’ = yes-no-question contour unless wh-word present at/near front of sentence x. Well, how did he do it? And what do you know? y. What else might we do? 12/8/2020 8

Contours: Accent + Phrasing z What do intonational contours ‘mean’ (Ladd ‘ 80, Bolinger

Contours: Accent + Phrasing z What do intonational contours ‘mean’ (Ladd ‘ 80, Bolinger ‘ 89)? y. Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) y. Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight. (L*+H L- H%) y. Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) y“Personality” S: Welcome to the Sunshine Travel System. 12/8/2020 9

y. Propositional attitude (uncertainty) Did you feed the animals? I fed the L*+H goldfish

y. Propositional attitude (uncertainty) Did you feed the animals? I fed the L*+H goldfish L-H% y. Distinguish direct/indirect speech acts x. Can you open the door? 12/8/2020 10

The TTS Front End Today z Corpus-based statistical methods instead of hand-built rule-sets z

The TTS Front End Today z Corpus-based statistical methods instead of hand-built rule-sets z Dictionaries instead of rules (but fall-back to rules) z Modest attempts to infer contrast, given/new z Text analysis tools: pos tagger, morphological analyzer, little parsing 12/8/2020 11

TTS: Where are we now? z Natural sounding speech for some utterances y. Where

TTS: Where are we now? z Natural sounding speech for some utterances y. Where good match between input and database z Still…hard to vary prosodic features and retain naturalness y. Yes-no questions: Do you want to fly first class? z Context-dependent variation still hard to infer from text and hard to realize naturally: 12/8/2020 12

y. Appropriate contours from text y. Emphasis, de-emphasis to convey focus, given/new distinction: I

y. Appropriate contours from text y. Emphasis, de-emphasis to convey focus, given/new distinction: I own a cat. Or, rather, my cat owns me. y. Variation in pitch range, rate, pausal duration to convey topic structure z Characteristics of ‘emotional speech’ little understood, so hard to convey: …a voice that sounds friendly, sympathetic, authoritative…. z How to mimic real voices? 12/8/2020 13

TTS vs. CTS z Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic

TTS vs. CTS z Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure, … information explicitly available to NLG z Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how z But…. generating prosody for CTS isn’t so easy 12/8/2020 14

To(nes and)B(reak)I(ndices) z Developed by prosody researchers in four meetings over 1991 -94 z

To(nes and)B(reak)I(ndices) z Developed by prosody researchers in four meetings over 1991 -94 z Goals: ydevise common labeling scheme for Standard American English that is robust and reliable ypromote collection of large, prosodically labeled, shareable corpora z To. BI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English, . . 12/8/2020 15

z Minimal To. BI transcription: yrecording of speech yf 0 contour y. To. BI

z Minimal To. BI transcription: yrecording of speech yf 0 contour y. To. BI tiers: xorthographic tier: words xbreak-index tier: degrees of junction (Price et al ‘ 89) xtonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘ 80) xmiscellaneous tier: disfluencies, non-speech sounds, etc. 12/8/2020 16

Sample To. BI Labeling 12/8/2020 17

Sample To. BI Labeling 12/8/2020 17

z Online training material, available at: yhttp: //www. ling. ohio-state. edu/phonetics/To. BI/ z Evaluation

z Online training material, available at: yhttp: //www. ling. ohio-state. edu/phonetics/To. BI/ z Evaluation y. Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘ 92, Pitrelli et al ‘ 94) 12/8/2020 18

Pitch Accent/Prominence in To. BI z Which items are made intonationally prominent and how?

Pitch Accent/Prominence in To. BI z Which items are made intonationally prominent and how? z Accent type: y. H* y. L*+H y. L+H* simple high (declarative) simple low (ynq) scooped, late rise (uncertainty/ incredulity) early rise to stress (contrastive focus) y. H+!H* fall onto stress (implied familiarity) 12/8/2020 19

 • Downstepped accents: • !H*, • L+!H*, • L*+!H • Degree of prominence:

• Downstepped accents: • !H*, • L+!H*, • L*+!H • Degree of prominence: §within a phrase: Hi. F 0 §across phrases 12/8/2020 20

Prosodic Phrasing in To. BI z ‘Levels’ of phrasing: yintermediate phrase: one or more

Prosodic Phrasing in To. BI z ‘Levels’ of phrasing: yintermediate phrase: one or more pitch accents plus a phrase accent (Hor L) yintonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) z To. BI break-index tier y 0 no word boundary y 1 word boundary y 2 y 3 y 4 12/8/2020 strong juncture with no tonal markings intermediate phrase boundary intonational phrase boundary 21

L-L% L-H% H-L% H-H% H* L* L*+H 12/8/2020 22

L-L% L-H% H-L% H-H% H* L* L*+H 12/8/2020 22

L-L% L-H% H-L% H-H% L+H* H+!H* H* !H* 12/8/2020 23

L-L% L-H% H-L% H-H% L+H* H+!H* H* !H* 12/8/2020 23

Contour Examples z http: //www. cs. columbia. edu/~julia/cs 6998/card s/examples. html 12/8/2020 24

Contour Examples z http: //www. cs. columbia. edu/~julia/cs 6998/card s/examples. html 12/8/2020 24

And Other Things Contribute: Pitch Range and Timing (Rate, Pause) z Level of speaker

And Other Things Contribute: Pitch Range and Timing (Rate, Pause) z Level of speaker engagement Hello vs. HELLO z Contour interpretation Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t incurable z Discourse/topic structure: paratones 12/8/2020 25

Corpus-Based Research z Predicting accent, phrasing, contours from large To. BI-labeled corpora z Features:

Corpus-Based Research z Predicting accent, phrasing, contours from large To. BI-labeled corpora z Features: y. Word position, p. o. s. window, word cooccurence, punctuation, capitalization, sentence length, paragraph position, … y. Results: x~80 -85% correct accent prediction x~92 -96% correct phrase boundary prediction x. Contours? ? x. Reality… 12/8/2020 26

z This is my version of a rather long sentence which ideally should be

z This is my version of a rather long sentence which ideally should be broken into several phrases automatically by a smart system but we don't know if this will actually happen do we? z Is a yes-no question uttered with falling intonation? Does that sound delightful? Mellifluous? z I don’t want cereal I want toast. z …. 12/8/2020 27

Next: z Story analysis and generation (readings will be available later this week –

Next: z Story analysis and generation (readings will be available later this week – we’ll send mail) 12/8/2020 28