Error Awareness and Recovery in TaskOriented Spoken Dialogue

  • Slides: 43
Download presentation
Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research)

Problem Lack of robustness when faced with understanding errors § Spans most domains and

Problem Lack of robustness when faced with understanding errors § Spans most domains and interaction types § Has a significant impact on performance 2

An example S: Are you a registered user ? U: No I'm not. No

An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens. . . Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO] 3

Some Statistics … § § Semantic error rates CMU Communicator [CMU] 32% CU Communicator

Some Statistics … § § Semantic error rates CMU Communicator [CMU] 32% CU Communicator [CU] 27% How May I Help You? [AT&T] 36% Jupiter [MIT] 28% Speech. Acts [SRI] 25% Corrections [Krahmer, Swerts, Litman, Levow] § 30% of utterances correct system mistakes § 2 -3 times more likely to be misrecognized 4

Significant Impact on Interaction § CMU Communicator Failed 26% 40% 33% sessions Contain understanding

Significant Impact on Interaction § CMU Communicator Failed 26% 40% 33% sessions Contain understanding errors § Multi-site Communicator Corpus [Shin et al] Failed 37% 5 63% sessions

Outline § Problem Ø Approach § Infrastructure § Research Program § Timeline & Summary

Outline § Problem Ø Approach § Infrastructure § Research Program § Timeline & Summary 6 problem : approach : infrastructure : indicators : strategies : decision process : summary

Increasing Robustness … § Increase the accuracy of speech recognition § Assume recognition is

Increasing Robustness … § Increase the accuracy of speech recognition § Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level § ASR performance increases / demands increase § More general 7 problem : approach : infrastructure : indicators : strategies : decision process : summary

Snapshot of Existing Work: Slide 1 § Theoretical models of grounding § Contribution Model

Snapshot of Existing Work: Slide 1 § Theoretical models of grounding § Contribution Model [Clark], Grounding Acts [Traum] Analytical/Descriptive, not decision oriented § Practice: heuristic rules § Misunderstandings n Threshold(s) on confidence scores § Non-understandings Ad-hoc, lack generality, not easy to extend 8 problem : approach : infrastructure : indicators : strategies : decision process : summary

Snapshot of Existing Work: Slide 2 § Conversation as Action under Uncertainty [Paek and

Snapshot of Existing Work: Slide 2 § Conversation as Action under Uncertainty [Paek and Horvitz] § Belief networks to model uncertainties § Decisions based on expected utility, VOI-analysis § Reinforcement learning for dialogue control policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc] § Formulate dialogue control as an MDP § Learn optimal control policy from data Do not scale up to complex, real-world domains 9 problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Program: Goals & Approach A task-independent, adaptive and scalable framework for error recovery

Research Program: Goals & Approach A task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Approach: § 10 Decision making under uncertainty problem : approach : infrastructure : indicators : strategies : decision process : summary

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 11 problem : approach : infrastructure : indicators : strategies : decision process : summary

Infrastructure § Completed Raven. Claw § Modern dialog management framework for complex, task-oriented domains

Infrastructure § Completed Raven. Claw § Modern dialog management framework for complex, task-oriented domains § Completed 12 Raven. Claw spoken dialogue systems § Test-bed for evaluation problem : approach : infrastructure : indicators : strategies : decision process : summary

Raven. Claw Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome

Raven. Claw Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome Greet. User Ask. Registered registered Date. Time Location Ask. Name Network user_name Properties Projector Whiteboard Dialogue Task (Specification) Domain-Independent Dialogue Engine registered: [No]-> false, [Yes] -> true Indicators Error Handling Decision Process Explicit. Confirm Ask. Registered Login Strategies Room. Line Dialogue Stack 13 registered: [No]-> false, [Yes] -> true user_name: [User. Name] query. date_time: [Date. Time] query. location: [Location] query. network: [Network] Expectation Agenda problem : approach : infrastructure : indicators : strategies : decision process : summary

Raven. Claw-based Systems § Room. Line § CMU Let’s Go!! Bus Information System §

Raven. Claw-based Systems § Room. Line § CMU Let’s Go!! Bus Information System § LARRI [Symphony] § Team. Talk [11 -741] § Eureka [11 -743] 14 problem : approach : infrastructure : indicators : strategies : decision process : summary

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 15 problem : approach : infrastructure : indicators : strategies : decision process : summary

Existing Work § Confidence Annotation § Traditionally focused on speech recognizer [Bansal, Chase, Cox,

Existing Work § Confidence Annotation § Traditionally focused on speech recognizer [Bansal, Chase, Cox, and others] § Recently, multiple sources of knowledge [San-Segundo, Walker, Bosch, Bohus, and others] n Recognition, parsing, dialogue management § Detect misunderstandings: ~ 80 -90% accuracy § Correction and Aware Site Detection [Swerts, Litman, Levow and others] § Multiple sources of knowledge § Detect corrections: ~ 80 -90% accuracy 16 problem : approach : infrastructure : indicators : strategies : decision process : summary

Proposed: Belief Updating § Continuously assess beliefs in light of initial confidence and subsequent

Proposed: Belief Updating § Continuously assess beliefs in light of initial confidence and subsequent events § An example: S: Where are you flying from? U: [City. Name={Aspen/0. 6; Austin/0. 2}] S: Did you say you wanted to fly out of Aspen? U: [No] [City. Name={Boston/0. 8}] [City. Name={Aspen/? ; Austin/? ; Boston/? }] 17 initial belief + system action + user response updated belief problem : approach : infrastructure : indicators : strategies : decision process : summary

Belief Updating: Approach § Model the update in a dynamic belief network t t+1

Belief Updating: Approach § Model the update in a dynamic belief network t t+1 C C initial belief updated belief system action User response features contents Current. Top confidence Confidence Yes correction 18 Current 3 r d Current 2 nd Utterance Length No Positive Markers Negative Markers problem : approach : infrastructure : indicators : strategies : decision process : summary

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 19 problem : approach : infrastructure : indicators : strategies : decision process : summary

Is the Dialogue Advancing Normally? Locally, turn-level: § Non-understanding indicators § Non-understanding flag directly

Is the Dialogue Advancing Normally? Locally, turn-level: § Non-understanding indicators § Non-understanding flag directly available § Develop additional indicators n Recognition, Understanding, Interpretation Globally, discourse-level: § Dialogue-on-track indicators § Summary statistics of non-understanding § 20 indicators Rate of dialogue advance problem : approach : infrastructure : indicators : strategies : decision process : summary

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 21 problem : approach : infrastructure : indicators : strategies : decision process : summary

Error Recovery Strategies § Identify and define an extended set of error handling strategies

Error Recovery Strategies § Identify and define an extended set of error handling strategies § Implement § Construct task-decoupled implementations of a large number of strategies § Evaluate performance and bring further refinements 22

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system Where are we? has reliable information Start over Scratch concept value (misunderstandings) Go back Explicit confirmation Channel establishment Implicit confirmation Suspend/Resume Disambiguation Repeat Ask repeat concept Summarize Reject concept Quit 23 Ensure that the dialogue on track Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Notify non-understanding. Start over Terminate session / Explicit confirm turn Direct to operator Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say problem : approach : infrastructure : indicators : strategies : decision process : summary

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system Where are we? has reliable information Start over Scratch concept value (misunderstandings) Go back Explicit confirmation Channel establishment Implicit confirmation Suspend/Resume Disambiguation Repeat Ask repeat concept Summarize Reject concept Quit 24 Ensure that the dialogue on track Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Notify non-understanding. Start over Terminate session / Explicit confirm turn Direct to operator Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say problem : approach : infrastructure : indicators : strategies : decision process : summary

Error Recovery Strategies: Evaluation § Reusability § Deploy in different spoken dialogue systems §

Error Recovery Strategies: Evaluation § Reusability § Deploy in different spoken dialogue systems § Efficiency of non-understanding strategies § Simple metric: Is the next utterance understood? § Efficiency depends on decision process § Construct upper and lower bounds for efficiency n n 25 Lower bound: decision process which chooses uniformly Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 26 problem : approach : infrastructure : indicators : strategies : decision process : summary

Previous Reinforcement Learning Work § Dialogue control ~ Markov Decision Process § States §

Previous Reinforcement Learning Work § Dialogue control ~ Markov Decision Process § States § Actions § Rewards § S 2 A S 1 S 3 Previous work: successes in small domains § NJFun [Singh, Kearns, Litman, Walker et al] § Problems § Lack of scalability § Once learned, policies are not reusable 27 problem : approach : infrastructure : indicators : strategies : decision process : summary

Proposed Approach Overcome previous shortcomings: 1. Focus learning only on error handling § §

Proposed Approach Overcome previous shortcomings: 1. Focus learning only on error handling § § § 2. Use a “divide-and-conquer” approach § 28 Reduces the size of the learning problem Favors reusability of learned policies Lessens the system development effort Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary

Gated Markov Decision Processes Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No

Gated Markov Decision Processes Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP Gating Mechanism Topic-MDP user_name No Action Concept-MDP Explicit Confirm No Action § § § 29 Small-size models Parameters can be tied across models Easy to design initial policies § Decoupling favors reusability of policies § Accommodate dynamic task generation Independence assumption problem : approach : infrastructure : indicators : strategies : decision process : summary

Reward structure & learning Global, post-gate rewards Local rewards Reward Action Gating Mechanism Reward

Reward structure & learning Global, post-gate rewards Local rewards Reward Action Gating Mechanism Reward MDP § § 30 MDP Rewards based on any dialogue performance metric Atypical, multi-agent reinforcement learning setting Reward MDP § § Reward MDP Multiple, standard RL problems Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary

Evaluation § Performance § Compare learned policies with initial heuristic § policies Metrics n

Evaluation § Performance § Compare learned policies with initial heuristic § policies Metrics n n § Task completion Efficiency Number and lengths of error segments User satisfaction Scalability § Deploy in a system operating with a sizable task § Theoretical analysis 31 problem : approach : infrastructure : indicators : strategies : decision process : summary

Outline § § § 32 Problem Approach Infrastructure Research Program Summary & Timeline problem

Outline § § § 32 Problem Approach Infrastructure Research Program Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

Summary of Contributions § Overall Goal: develop a task-independent, adaptive and scalable framework for

Summary of Contributions § Overall Goal: develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems § Modern dialogue management framework § Belief updating framework § Investigation of an extended set of error handling § 33 strategies Scalable data-driven approach for learning error handling policies problem : approach : infrastructure : indicators : strategies : decision process : summary

Timeline now data indicators strategies Misunderstanding and non-understanding strategies end of year 4 Data

Timeline now data indicators strategies Misunderstanding and non-understanding strategies end of year 4 Data collection for belief updating and WOZ study Develop and evaluate the belief updating models Data collection for RL training end of year 5 Data collection for RL evaluation Contingency data collection efforts 5. 5 years 34 Evaluate non-understanding strategies; develop remaining strategies decisions Investigate theoretical aspects of proposed reinforcement learning model Implement dialogue-on-track indicators proposal milestone 1 milestone 2 Error handling decision process: reinforcement learning experiments milestone 3 Additional experiments: extensions or contingency work defense problem : approach : infrastructure : indicators : strategies : decision process : summary

Thank You! Questions & Comments committee members, then floor 35

Thank You! Questions & Comments committee members, then floor 35

Indicators: Goals § Goal: Increase awareness and capacity to detect problems § Develop indicators

Indicators: Goals § Goal: Increase awareness and capacity to detect problems § Develop indicators which can inform the error handling process about potential problems System acquires information System acquires incorrect information Misunderstanding Understanding process System does not acquire information Non-understanding 36 System acquires correct information OK

37 problem : approach : support work : indicators : strategies : decision process

37 problem : approach : support work : indicators : strategies : decision process : summary

Three Desired Properties § Task-Independence § § Adaptability § § Learn from experience how

Three Desired Properties § Task-Independence § § Adaptability § § Learn from experience how to adapt to the characteristics of various domains Scalability § 38 Reuse the proposed architecture across different spoken dialogue systems with a minimal amount of authoring effort Applicable in spoken dialogue systems operating with large, practical tasks

Expl. Conf Impl. Conf LC Impl. Conf MC No. Act HC No. Act 0

Expl. Conf Impl. Conf LC Impl. Conf MC No. Act HC No. Act 0 39 Expl. Conf No. Act

Belief Updating: Approach § § Model the update in a dynamic belief network Top-N

Belief Updating: Approach § § Model the update in a dynamic belief network Top-N values Fixed structure Learn parameters t t+1 C C System Action § Data collection § Evaluation § Accuracy § Soft-error Current. Top Current 2 nd Confidence Current 3 r d No Yes Utterance Length Positive Markers Negative Markers User response features 40 problem : approach : infrastructure : indicators : strategies : decision process : summary

Gated Markov Decision Processes Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No

Gated Markov Decision Processes Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action Issues: § Structure of individual MDPs § Gating mechanism § Reward structure and learning 41 problem : approach : infrastructure : indicators : strategies : decision process : summary

Structure for individual MDPs § State-space: § informative subset of corresponding indicators § Concept-MDPs:

Structure for individual MDPs § State-space: § informative subset of corresponding indicators § Concept-MDPs: confidence / beliefs § Topic-MDPs: non-understanding, dialogue-ontrack indicators § Action-space § corresponding system-initiated error handling strategies 42 problem : approach : infrastructure : indicators : strategies : decision process : summary

Gating Mechanism § Heuristic derived from domain-independent dialogue principles § Give priority to topics

Gating Mechanism § Heuristic derived from domain-independent dialogue principles § Give priority to topics over concept § Give priority to entities closer to the conversational focus 43 problem : approach : infrastructure : indicators : strategies : decision process : summary