Dialogue and Conversational Agents Part IV Chapter 19

  • Slides: 25
Download presentation
Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech

Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Daniel Jurafsky and James H. Martin Spoken Dialogue Systems 1

Remaining Outline Advanced: Plan-Based Dialogue Agents Models of Discourse Structure Do we have them?

Remaining Outline Advanced: Plan-Based Dialogue Agents Models of Discourse Structure Do we have them? Grosz & Sidner ’ 86 What identifies discourse structure to Hearers? Textual cues Spoken cues How can we produce appropriate discourse structure in TTS systems? Can we identify discourse structure automatically, from speech? Spoken Dialogue Systems 2

Is there structure in this discourse? A beautiful mallard spotted the dove I was

Is there structure in this discourse? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 3

Is this a reasonable structure? A beautiful mallard spotted the dove I was feeding.

Is this a reasonable structure? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 4

This? A beautiful mallard spotted the dove I was feeding. The duck dove supply

This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 5

This? A beautiful mallard spotted the dove I was feeding. The duck dove supply

This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 6

What information do we use in segmenting a discourse? ‘Topic’ coherence? Repeated reference? ‘Cue’

What information do we use in segmenting a discourse? ‘Topic’ coherence? Repeated reference? ‘Cue’ phrases? ? ? Spoken Dialogue Systems 7

Structures of Discourse Structure (Grosz & Sidner ‘ 86) A leading theory of discourse

Structures of Discourse Structure (Grosz & Sidner ‘ 86) A leading theory of discourse structure Based upon Speaker intentions and Speaker and Hearer attentional state Identifies a few, general relations that hold among Speaker intentions Identifies a model of attentional state Three components: Linguistic structure Intentional structure Attentional structure Spoken Dialogue Systems 8

Linguistic Structure What is actually said or written How is the linguistic structure represented?

Linguistic Structure What is actually said or written How is the linguistic structure represented? Assume discourse is segmented into Discourse Segments (DS) – What is the basic unit of analysis? – Do we all segment alike? – Do we all use the same cues? Spoken Dialogue Systems 9

Linguistic Structure of Discourse D S 1: A beautiful mallard spotted the dove I

Linguistic Structure of Discourse D S 1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. S 2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 10

Intentional Structure Discourse purpose (DP): basic purpose of the Speaker in producing the discourse

Intentional Structure Discourse purpose (DP): basic purpose of the Speaker in producing the discourse Discourse segment purposes (DSPs): the Speaker’s purpose in producing the segment Segments are related to one another by their purposes: Satisfaction-precedence: DSP 1 must be satisfied before DSP 2 Dominance: DSP 1 dominates DSP 2 if fulfilling DSP 2 constitutes part of fulfilling DSP 1 Spoken Dialogue Systems 11

Linguistic Structure of Discourse D DSP 1: Describe murder of dove by duck. S

Linguistic Structure of Discourse D DSP 1: Describe murder of dove by duck. S 1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. DSP 2: Describe meeting of old friend. S 2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 12

DSP 2: Describe recovery process. S 2: DSP 3: Describe snack S 3: Well,

DSP 2: Describe recovery process. S 2: DSP 3: Describe snack S 3: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. DSP 3: Describe meeting old friend. S 4: To my surprise, I ran into a friend from back home. DSP 5: Describe friend’s reaction S 5: When I told her of my recent experience she questioned my sanity. Spoken Dialogue Systems 13

Attentional State: The Focus Stack of focus spaces, each containing objects, properties and relations

Attentional State: The Focus Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP State changes: transition rules controlling the addition/deletion of focus spaces Information at lower levels may or may not be available at higher levels Focus spaces are pushed onto the stack when – A new DS is begun Spoken Dialogue Systems 14

– An embedded DS (e. g. a DS dominated by another DS) is begun

– An embedded DS (e. g. a DS dominated by another DS) is begun Focus spaces are popped when they are completed State of focus stack models felicitous reference, coherence in discourse S 2: DSP 2, scene, Speaker, snack_bar Cocoa, friend, home, sanity S 1: DSP 1, duck, dove, Speaker, duck_dove_supply Spoken Dialogue Systems 15

Limits of the Theory Assumes discourses are task-oriented Assumes a single, hierarchical structure shared

Limits of the Theory Assumes discourses are task-oriented Assumes a single, hierarchical structure shared by S and H Questions: Do people really build such structures when they converse? Use them in interpreting what others say? How could they do it? Spoken Dialogue Systems 16

How might people recognize discourse structure? Linguistic markers? tense and aspect cue phrases Inference

How might people recognize discourse structure? Linguistic markers? tense and aspect cue phrases Inference of Speaker intentions? Inference from task structure? Intonational Information? Spoken Dialogue Systems 17

Acoustic and Prosodic Cues to Discourse Structure Intuition: Speakers vary acoustic and prosodic cues

Acoustic and Prosodic Cues to Discourse Structure Intuition: Speakers vary acoustic and prosodic cues to convey variation in discourse structure Systematic? In read or spontaneous speech? Evidence: Observations from recorded corpora Laboratory experiments Machine learning of discourse structure from acoustic/prosodic features Spoken Dialogue Systems 18

Prosodic Correlates of Discourse/Topic Structure Pitch range Lehiste ’ 75, Brown et al ’

Prosodic Correlates of Discourse/Topic Structure Pitch range Lehiste ’ 75, Brown et al ’ 83, Silverman ’ 86, Avesani & Vayra ’ 88, Ayers ’ 92, Swerts et al ’ 92, Grosz & Hirschberg’ 92, Swerts & Ostendorf ’ 95, Hirschberg & Nakatani ‘ 96 Preceding pause Lehiste ’ 79, Chafe ’ 80, Brown et al ’ 83, Silverman ’ 86, Woodbury ’ 87, Avesani & Vayra ’ 88, Grosz & Hirschberg’ 92, Passoneau & Litman ’ 93, Hirschberg & Nakatani ‘ 96 Spoken Dialogue Systems 19

Rate Butterworth ’ 75, Lehiste ’ 80, Grosz & Hirschberg’ 92, Hirschberg & Nakatani

Rate Butterworth ’ 75, Lehiste ’ 80, Grosz & Hirschberg’ 92, Hirschberg & Nakatani ‘ 96 Amplitude Brown et al ’ 83, Grosz & Hirschberg’ 92, Hirschberg & Nakatani ‘ 96 Contour Brown et al ’ 83, Woodbury ’ 87, Swerts et al ‘ 92 Spoken Dialogue Systems 20

Issues Do we find significant and reliable cues to discourse structure in prosodic variation

Issues Do we find significant and reliable cues to discourse structure in prosodic variation When tested against an independent theory of discourse structure? In spontaneous as well as read speech? Are Hearers interpretations of discourse structure influenced by intonational variation? Spoken Dialogue Systems 21

Grosz & Hirschberg ‘ 92 Small corpus of read AP newswire Read by professional

Grosz & Hirschberg ‘ 92 Small corpus of read AP newswire Read by professional speaker Labeled for discourse structure from text alone or from text and speech Pre-To. BI labeled Acoustic-prosodic features extracted for each intermediate (level 3) phrase – – Pitch range and change from prior phrase Intensity (rms) and change in db from prior phrase Preceding and subsequent pause Speaking rate Spoken Dialogue Systems 22

Analysis of phrases in different segment positions: SBEG, SF, parentheticals, quoted speech ANOVA’s and

Analysis of phrases in different segment positions: SBEG, SF, parentheticals, quoted speech ANOVA’s and t-tests on means Results: Direct quotes: larger pitch range Parentheticals: smaller range, neg change from prior phrase, neg change in db, faster rate SBEG: larger range, louder, greater preceding pause, less subsequent pause SF: greater subsequent pause Spoken Dialogue Systems 23

Machine learning experiments identified: SBEG with 91. 5% est. accuracy (x-validation) SF, 92. 5%

Machine learning experiments identified: SBEG with 91. 5% est. accuracy (x-validation) SF, 92. 5% Attributive tags, 96. 9% Direct quotations, 86. 4% Indirect quotations, 88. 5% Parentheticals, 89. 2% Conclusion: Acoustic/prosodic information is available to permit Hearers to identify discourse structure… Spoken Dialogue Systems 24

Summary of Dialog in general The Linguistics of Conversation Basic Conversational Agents ASR NLU

Summary of Dialog in general The Linguistics of Conversation Basic Conversational Agents ASR NLU Generation Dialogue Manager Design Finite State Frame-based Initiative: User, System, Mixed Voice. XML Information-State Dialogue-Act Detection Dialogue-Act Generation Evaluation Utility-based conversational agents MDP, POMDP Advanced: Plan-Based Dialogue Agents Spoken Dialogue Systems 25