Challenges in Dialogue Discourse and Dialogue CMSC 35900

Roadmap • Issues in Dialogue – Dialogue vs General Discourse – Dialogue Acts •

Dialogue vs General Discourse • Key contrast: Two or more speakers – Primary focus

Turn-Taking • Multi-party discourse – Need to trade off speaker/hearer roles • Interpret reference

Turn-taking: When • Rule-governed behavior – Possibly multiple legal turn change times • Aka

Turn-taking: Who & How • At each TRP in each turn (Sacks 1974) –

Turn-taking in HCI • Human turn end: – Detected by 250 ms silence •

Gesture, Gaze & Voice • Range of gestural signals: – head (nod, shake), shoulder,

Yielding the Floor • Turn change signal – Offer floor to auditor/hearer • Cues:

Taking the Floor • Speaker-state signal – Indicate becoming speaker • Occurs at beginning

Retaining the Floor • Within-turn signal – Still speaker: Look at hearer as end

Segmenting Turns • Speaker alone: – Within-turn signal->end of one unit; – Continuation signal

Regaining Attention • Gaze & Disfluency – Disfluency: “perturbation” in speech • Silent pause,

Collaborative Communication • Speaker tries to establish and add to “common ground” – “mutual

Computational Models • (Traum et al) revised for computation – Involves both speaker and

Implicature & Grice’s Maxims • Inferences licensed by utterances • Grice’s Maxims – Quantity:

Speech & Dialogue Acts • Speech Acts (Austin, Searle) – “Doing things with words”

Dialogue Acts • (aka Conversational moves) – Enriched set of speech acts • Capture

DAMSL • Dialogue Act Tagging framework – Adjacency pairs+grounding+repair • Forward looking functions –

Dialogue Act Recognition • Goal: Identify dialogue act tag(s) from surface form • Challenge:

Plan-inference-based • Classic AI (BDI) planning framework – Model Belief, Knowledge, Desire • Formal

Cue-based Interpretation • Employs sets of features to identify – Words and collocations: Please

From Human to Computer • Conversational agents – Systems that (try to) participate in

Dialogue Manager Tradeoffs • Flexibility vs Simplicity/Predictability – System vs User vs Mixed Initiative

Slides: 25

Download presentation

Challenges in Dialogue Discourse and Dialogue CMSC 35900 -1 October 27, 2006

Roadmap • Issues in Dialogue – Dialogue vs General Discourse – Dialogue Acts • Modeling • Recognition and Interpretation – Dialogue Management for Computational Agents

Dialogue vs General Discourse • Key contrast: Two or more speakers – Primary focus on speech • Issues in multi-party spoken dialogue – Turn-taking – who speaks next, when? – Collaboration – clarification, feedback, … – Disfluencies – Adjacency pairs, dialogue acts

Turn-Taking • Multi-party discourse – Need to trade off speaker/hearer roles • Interpret reference from sequential utterances • When? – End of sentence? • No: multi-utterance turns – Silence? • No: little silence in smooth dialogue: < 250 ms – When other starts speaking? • No: relatively little overlap face-to-face: ~5%

Turn-taking: When • Rule-governed behavior – Possibly multiple legal turn change times • Aka transition-relevance places (TRP) • Generally at utterance boundaries – Utterance not necessarily sentence – In fact, utterance/sentence boundaries not obvious in speech » Don’t necessarily pause between sentences • Automatic utterance boundary detection – Cue words (okay, so, . . ); POS sequences; prosody

Turn-taking: Who & How • At each TRP in each turn (Sacks 1974) – If speaker has selected A to speak, A must take floor – If speaker has selected no one to speak, anyone can – If no one else takes the turn, the speaker can • Selecting speaker A: – By explicit/implicit mention: What about it, Bob? • By gaze, function • Selecting others: questions, greetings, closing – (Traum et al. , 2003)

Turn-taking in HCI • Human turn end: – Detected by 250 ms silence • System turn end: – Signaled by end of speech – Indicated by any human sound • Barge-in • Continued attention: – No signal

Gesture, Gaze & Voice • Range of gestural signals: – head (nod, shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts – Align with syllables • Units: phonemic clause + change • Study with recorded exchanges

Yielding the Floor • Turn change signal – Offer floor to auditor/hearer • Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation

Taking the Floor • Speaker-state signal – Indicate becoming speaker • Occurs at beginning of turns • Cues: – Shift in head direction • AND/OR – Start of gesture

Retaining the Floor • Within-turn signal – Still speaker: Look at hearer as end clause • Continuation signal – Still speaker: Look away after within-turn/back • Back-channel: – ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate – NOT a turn: signal attention, agreement, confusion

Segmenting Turns • Speaker alone: – Within-turn signal->end of one unit; – Continuation signal -. Beginning of next unit • Joint signal: – Speaker turn signal (end); auditor ->speaker; speaker>auditor – Within-turn + back-channel + continuation • Back-channels signal understanding – Early back-channel + continuation

Regaining Attention • Gaze & Disfluency – Disfluency: “perturbation” in speech • Silent pause, filled pause, restart – Gaze: • Conversants don’t stare at each other constantly • However, speaker expects to meet hearer’s gaze – Confirm hearer’s attention • Disfluency occurs when realize hearer NOT attending – Pause until begin gazing, or to request attention

Collaborative Communication • Speaker tries to establish and add to “common ground” – “mutual belief” – Presumed a joint, collaborative activity • Make sure “mutually believe” the same thing – Hearer can acknowledge/accept/disagree » Clark & Schaeffer: Degrees of grounding • Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention

Computational Models • (Traum et al) revised for computation – Involves both speaker and hearer • Initiate, Continue, Acknowledge, Repair, Request Repair, etc – Common phenomena • “Back-Channel” – “uh-huh”, “okay”, etc – Allows hearer to signal continued attention, ack » WITHOUT taking the turn • Requests for repair – common in human-human – Even more common in human-computer dialogue

Implicature & Grice’s Maxims • Inferences licensed by utterances • Grice’s Maxims – Quantity: Be as informative as required • “There are two classes per week” – not 1, or 5 – Quality: Be truthful – don’t lie, – Relevance: Be relevant – Manner: “Be perspicuous” • Don’t be obscure, ambiguous, prolix, or disorderly • “Flouting” maxims: Consciously violate for effect – Humor, emphasis,

Speech & Dialogue Acts • Speech Acts (Austin, Searle) – “Doing things with words” • E. g. performatives: “I dub thee Sir Lancelot” – Illocutionary acts: act of asking, answering, promising, etc in saying an utterance • Include: Assertives: “I propose to. . ” , Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”

Dialogue Acts • (aka Conversational moves) – Enriched set of speech acts • Capture full range of conversational functions – Adjacency pairs: Many two-part structures • E. g. Question-Answer, Greeting-Greeting, Request. Grant, etc… • Paired for speaker-hearer dyads – Contrast with rhetorical relations in monologue

DAMSL • Dialogue Act Tagging framework – Adjacency pairs+grounding+repair • Forward looking functions – Statement, info-request, commit, closing, etc • Backward looking functions – Focus on link to prior speaker utterance • Agreement, answer, accept, etc. .

Tagged Dialogue

Dialogue Act Recognition • Goal: Identify dialogue act tag(s) from surface form • Challenge: Surface form can be ambiguous – “Can you X? ” – yes/no question, or info-request • “Flying on the 11 th, at what time? ” – check, statement • Requires interpretation by hearer – Strategies: Plan inference, cue recognition

Plan-inference-based • Classic AI (BDI) planning framework – Model Belief, Knowledge, Desire • Formal definition with predicate calculus – Axiomatization of plans and actions as well – STRIPS-style: Preconditions, Effects, Body – Rules for plan inference • Elegant, but. . – Labor-intensive rule, KB, heuristic development – Effectively AI-complete

Cue-based Interpretation • Employs sets of features to identify – Words and collocations: Please -> request – Prosody: Rising pitch -> yes/no question – Conversational structure: prior act • Example: Check: • Syntax: tag question “, right? ” • Syntax + prosody: Fragment with rise • N-gram: argmax d P(d)P(W|d) – So you, sounds like, etc • Details later ….

From Human to Computer • Conversational agents – Systems that (try to) participate in dialogues – Examples: Directory assistance, travel info, weather, restaurant and navigation info • Issues: – Limited understanding: ASR errors, interpretation – Computational costs: • broader coverage -> slower, less accurate

Dialogue Manager Tradeoffs • Flexibility vs Simplicity/Predictability – System vs User vs Mixed Initiative – Order of dialogue interaction – Conversational “naturalness” vs Accuracy – Cost of model construction, generalization, learning, etc • Models: FST, Frame-based, HMM, BDI • Evaluation frameworks