Adapting and Learning Dialogue Models Discourse Dialogue CMSC
- Slides: 27
Adapting and Learning Dialogue Models Discourse & Dialogue CMSC 35900 -1 November 19, 2006
Roadmap • The Problem: Portability • Task domain: Call-routing • Porting: – Speech recognition – Call-routing – Dialogue management • Conclusions • Learning DM strategies – HMMs and POMDPs
SLS Portability • Spoken language system design – Record or simulate user interactions – Collect vocabulary, sentence style, sequence • Transcribe/label – Expert creates vocabulary, language model, dialogue model • Problem: Costly, time-consuming, expert
Call-routing • Goal: Given an utterance, identify type – Dispatch to right operator • Classification task: – Manual rules or data-driven methods • Feature-based classification (Boosting) – Pre-defined types, e. g. : • Hello? -> Hello; I have a question -> request(info) • I would like to know my balance. > request(balance)
Dialogue Management • Flow Controller – Pluggable dialogue strategy modules • ATN: call-flow, easy to augment, manage context – Inputs: context, semantic rep. of utterance • ASR – Language models • Trigrams, in probabilistic framework
Adaptation: ASR • ASR: Language models – Usually trained from in-domain transcriptions • Here: out-of-domain transcriptions – Switchboard, spoken dialog (telecomm, insur) – In-domain web pages • New domain: pharmaceuticals • Style differences: SLS: pronouns; OOV: med best • Best accuracy: spoken dialogue+web – SWBD too big/slow
Adaptation: Call-routing • Manual tagging: Slow, expensive • Here: Existing out-of-domain labeled data – Meta call-types: Library • Generic: all apps • Re-usable: in-domain, but already exist • Specific: only this app – Grouping done by experts • Bootstrap: Start with generic, reusable
Call-type Classification • Boostexter: word n-gram features; 1, 100 iter – ASR output basis • Telecomm based call-type library • Two classifications: reject-yn; classification – In-domain: true: 78%; ASR: 62% – Generic: test on generic: 95%; 91% – Bootstrap: generic+reuse+rules: 79%, 68%
Dialogue Model • Build dialogue strategy templates – Based on call-type classification • Generic: – E. g. . Yes, no, hello, repeat, help • Cause generic context dependent reply • Tag as vague/concrete: – Vague: “I have a question” -> clarification – Concrete: clear routing, attributes – sub-dialogs
Dialogue Model Porting • Evaluation: – Compare to original transcribed dialogue • Task 1: DM category: 32 clusters of calls – Bootstrap 16 categories – 70% of instances • Using call-type classifiers: get class, conf, concrete? • If confident/concrete/correct -> correct; – If incorrect, error • Also classify vague/generic • 67 -70% accuracy for DM, routing task
Conclusions • Portability: – Bootstrapping of ASR, Call-type, DM – Generally effective • Call-type success high • Others: potential
Learning DM Strategies • Prior approaches: – Hand-coded: state-, frame- or agent-based – Adaptation bootstraps from existing structure • Alternative: – Capture prior interaction patterns – Learn dialogue structure and management
Training HMM DM • Construct training corpus – E. g. Record human-human interactions – Identify and label states • Train HMM dialogue management – Use tagged sequences to learn • Correspondences between utterances and states • State transition probabilities • Effective, still requires initial tagging
Reinforcement Learning • Model dialogues with (partially observable) Markov decision processes – – Users form stochastic env, Actions are system utterances, State is dialogue so far Goal: maximize some utility measure • Task completion/user satisfaction • Learn policy – implemented as actions in state – That optimizes utility measure
Applications • Toot – train information • Litman, Kearns, et al • Learned different initiative/confirmation strategies • Air travel bookings (Young et al 2006) – Problem: huge number of possible states • More airports, dramatically more possible utts – Approach: Collapse all alternative slot fillers • Represent with single default
Turn-taking Discourse and Dialogue CS 35900 -1 November 16, 2004
Agenda • Motivation – Silence in Human-Computer Dialogue • Turn-taking in human-human dialogue – Turn-change signals – Back-channel acknowledgments – Maintaining contact • Exploiting to improve HCC – Automatic identification of disfluencies, jump-in points, and jump-ins
Turn-taking in HCI • Human turn end: – Detected by 250 ms silence • System turn end: – Signaled by end of speech – Indicated by any human sound • Barge-in • Continued attention: – No signal
Yielding & Taking the Floor • Turn change signal – Offer floor to auditor/hearer – Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation • Speaker-state signal: • Shift in head direction AND/OR Start of gesture
Retaining the Floor • Within-turn signal – Still speaker: Look at hearer as end clause • Continuation signal – Still speaker: Look away after within-turn/back • Back-channel: – ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate – NOT a turn: signal attention, agreement, confusion
Improving Human-Computer Turn-taking • Identifying cues to turn change and turn start • Meeting conversations: – Recorded, natural research meetings – Multi-party – Overlapping speech – Units = “Spurts” between 500 ms silence
Tasks • Sentence/disfluency/non-boundary ID – End of sentence, break off, continue • Jump-in points – Times when others “jump in” • Jump-in words – Interruption vs start from silence • Off- and on- line • Language model and/or prosodic cues
Text + Prosody • Text sequence: – Modeled as n-gram language model • Hidden event prediction – e. g. boundary as hidden state – Implement as HMM • Prosody: – Duration, Pitch, Pause, Energy – Decision trees: classify + probability • Integrate LM + DT
Interpreting Breaks • For each inter-word position: – Is it a disfluency, sentence end, or continuation? • Key features: – Pause duration, vowel duration • 62% accuracy wrt 50% chance baseline – ~90% overall • Best combines LM & DT
Jump-in Points • (Used) Possible turn changes – Points WITHIN spurt where new speaker starts • Key features: – Pause duration, low energy, pitch fall – No lexical/punctuation features used – Forward features useless • Look like SB but aren’t • Accuracy: 65% wrt 50% baseline • Performance depends only on preceding prosodic features
Jump-in Features • Do people speak differently when jump-in? – Differ from regular turn starts? • Examine only first words of turns – No LM • Key features: – Raised pitch, raised amplitude • Accuracy: 77% wrt 50% baseline – Prosody only
Summary • Prosodic features signal conversational moves – Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation – Jump-ins occur at locations that sound like sent. ends – Raise voice when jump in
- Discourse analysis in linguistics
- Adopting and adapting teaching materials
- Modals and semimodals
- How well are we adapting
- Adapting marketing to the new economy
- Adapting the message to your audience
- Adapting to challenges of the micro environment
- Adapting the price
- The process of adapting borrowed cultural traits.
- Adapting the price
- Adapting to your audience
- How well are we adapting
- Adapting curriculum to bridge equity gaps
- Congratulating and responding to congratulations dialogue
- Cuadro comparativo de e-learning
- Computer vision models learning and inference
- Computer vision models learning and inference
- Computer vision: models, learning, and inference pdf
- Deploying deep learning models with docker and kubernetes
- Computer vision: models, learning, and inference
- Self-paced learning for latent variable models
- Gerard schuster
- Geometric models in machine learning
- Using inaccurate models in reinforcement learning
- Stealing machine learning models via prediction apis
- Learning styles models
- Stealing machine learning models via prediction apis
- Apple notes