Error Awareness and Recovery in TaskOriented Spoken Dialogue

Problem Lack of robustness when faced with understanding errors § Spans most domains and

An example S: What city are you leaving from? U: Urbana Champaign [OKAY IN

Some Statistics … § § Semantic error rates: ~25 -35% Speech. Acts [SRI] 25%

Significant Impact on Interaction § CMU Communicator Failed 26% 40% 33% sessions Contain understanding

Outline § Problem Ø Approach § Infrastructure § Research Program § Summary & Timeline

Increasing Robustness … 7 § Increase the accuracy of speech recognition § Assume recognition

Snapshot of Existing Work: Slide 1 § Theoretical models of grounding § Contribution Model

Snapshot of Existing Work: Slide 2 § Conversation as Action under Uncertainty [Paek and

Thesis Statement Develop a task-independent, adaptive and scalable framework for error recovery in taskoriented

Three components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Infrastructure § Completed Raven. Claw § Modern dialog management framework for complex, task-oriented domains

Raven. Claw Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome

Raven. Claw-based Systems 14 System Domain Room. Line Information Access CMU Let’s Go! Bus

Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of

Existing Work § Confidence Annotation § Traditionally focused on speech recognizer [Bansal, Chase, Cox,

Proposed: Belief Updating § Continuously assess beliefs in light of initial confidence and subsequent

Belief Updating: Approach § Model the update in a dynamic belief network User concept

Is the Dialogue Advancing Normally? Locally, turn-level: § Non-understanding indicators § Non-understanding flag directly

Error Recovery Strategies § Identify and define an extended set of error handling strategies

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system

Error Recovery Strategies: Evaluation § Reusability § Deploy in different spoken dialogue systems §

Previous Reinforcement Learning Work § Dialogue control ~ Markov Decision Process § States §

Proposed Approach Overcome previous shortcomings: 1. Focus learning only on error handling § §

Decision Process Architecture Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action

Reward Structure & Learning Global, post-gate rewards Local rewards Reward Action Gating Mechanism Reward

Evaluation § Performance § Compare learned policies with initial heuristic § policies Metrics n

Outline § § § 32 Problem Approach Infrastructure Research Program Summary & Timeline problem

Summary of Anticipated Contributions § Goal: develop a task-independent, adaptive and scalable framework for

Timeline now data indicators February 2004 end of year 4 September 2004 Misunderstanding and

Errors in spoken dialogue systems System acquires correct information OK System acquires information Recognition

Structure of Individual MDPs § Concept MDPs § § State-space: belief indicators Action-space: concept

Gating Mechanism § Heuristic derived from domain-independent dialogue principles § Give priority to entities

Task-independence / Reusability § Argument: architecure Room. Line query Get. Query Login Get. Results

Adaptable § Argument: reinforcement learning approach Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP

Scalable § Argument: architecture Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No

Scalability of Reinforcement Learning § NJFun § 3 concepts, 7 state variables, 62 states

Extension Work Items § Portability of confidence annotation and belief updating schemes § Use

Study by [Shin et al] § Labeled error segments in 141 dialogs from multiple

Errors Indicators, Strategies & Decision Process § § § Error indicators Strategies Decision Process

An example S: Where do you want to go to? U: I wanna go

Slides: 48

Download presentation

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research)

Problem Lack of robustness when faced with understanding errors § Spans most domains and interaction types § Has a significant impact on performance 2

An example S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1: 40 pm, arrives Seoul at 5 pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1: 40 pm arrives Seoul at ……… 3

Some Statistics … § § Semantic error rates: ~25 -35% Speech. Acts [SRI] 25% CU Communicator [CU] 27% Jupiter [MIT] 28% CMU Communicator [CMU] 32% How May I Help You? [AT&T] 36% Corrections [Krahmer, Swerts, Litman, Levow] § 30% of utterances correct system mistakes § 2 -3 times more likely to be misrecognized 4

Significant Impact on Interaction § CMU Communicator Failed 26% 40% 33% sessions Contain understanding errors § Multi-site Communicator Corpus [Shin et al] Failed 37% 5 63% sessions

Outline § Problem Ø Approach § Infrastructure § Research Program § Summary & Timeline 6 problem : approach : infrastructure : indicators : strategies : decision process : summary

Increasing Robustness … 7 § Increase the accuracy of speech recognition § Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level problem : approach : infrastructure : indicators : strategies : decision process : summary

Snapshot of Existing Work: Slide 1 § Theoretical models of grounding § Contribution Model [Clark], Grounding Acts [Traum] Analytical/Descriptive, not decision oriented § Practice: heuristic rules § Misunderstandings n Threshold(s) on confidence scores § Non-understandings Ad-hoc, lack generality, not easy to extend 8 problem : approach : infrastructure : indicators : strategies : decision process : summary

Snapshot of Existing Work: Slide 2 § Conversation as Action under Uncertainty [Paek and Horvitz] § Belief networks to model uncertainties § Decisions based on expected utility, VOI-analysis § Reinforcement learning for dialogue control policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc] § Formulate dialogue control as an MDP § Learn optimal control policy from data Do not scale up to complex, real-world tasks 9 problem : approach : infrastructure : indicators : strategies : decision process : summary

Thesis Statement Develop a task-independent, adaptive and scalable framework for error recovery in taskoriented spoken dialogue systems Approach: § 10 Decision making under uncertainty problem : approach : infrastructure : indicators : strategies : decision process : summary

Three components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 11 problem : approach : infrastructure : indicators : strategies : decision process : summary

Infrastructure § Completed Raven. Claw § Modern dialog management framework for complex, task-oriented domains § Completed 12 Raven. Claw spoken dialogue systems § Test-bed for evaluation problem : approach : infrastructure : indicators : strategies : decision process : summary

Raven. Claw Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome Greet. User Ask. Registered registered Date. Time Location Ask. Name Network user_name Properties Projector Whiteboard Dialogue Task (Specification) Domain-Independent Dialogue Engine registered: [No]-> false, [Yes] -> true Error Indicators Error Handling Decision Process Explicit. Confirm Ask. Registered Login Strategies Room. Line Dialogue Stack 13 registered: [No]-> false, [Yes] -> true user_name: [User. Name] query. date_time: [Date. Time] query. location: [Location] query. network: [Network] Expectation Agenda problem : approach : infrastructure : indicators : strategies : decision process : summary

Raven. Claw-based Systems 14 System Domain Room. Line Information Access CMU Let’s Go! Bus Information System Information Access LARRI [Symphony] Guidance through procedures Intelligent Procedure Assistant [NASA Ames] Guidance through procedures Team. Talk [11 -754] Command-control Eureka [11 -743] Web-access problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 15 problem : approach : infrastructure : indicators : strategies : decision process : summary

Existing Work § Confidence Annotation § Traditionally focused on speech recognizer [Bansal, Chase, Cox, and others] § Recently, multiple sources of knowledge [San-Segundo, Walker, Bosch, Bohus, and others] n Recognition, parsing, dialogue management § Detect misunderstandings: ~ 80 -90% accuracy § Correction and Aware Site Detection [Swerts, Litman, Levow and others] § Multiple sources of knowledge § Detect corrections: ~ 80 -90% accuracy 16 problem : approach : infrastructure : indicators : strategies : decision process : summary

Proposed: Belief Updating § Continuously assess beliefs in light of initial confidence and subsequent events § An example: S: Where are you flying from? U: [City. Name={Aspen/0. 6; Austin/0. 2}] S: Did you say you wanted to fly out of Aspen? U: [No/0. 6] [City. Name={Boston/0. 8}] [City. Name={Aspen/? ; Austin/? ; Boston/? }] 17 initial belief + system action + user response updated belief problem : approach : infrastructure : indicators : strategies : decision process : summary

Belief Updating: Approach § Model the update in a dynamic belief network User concept t User concept System action t+1 User response contents 1 st Hyp confidence 2 nd Hyp Confidence Yes correction 18 3 rd Hyp Utterance Length No Positive Markers Negative Markers problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 19 problem : approach : infrastructure : indicators : strategies : decision process : summary

Is the Dialogue Advancing Normally? Locally, turn-level: § Non-understanding indicators § Non-understanding flag directly available § Develop additional indicators n Recognition, Understanding, Interpretation Globally, discourse-level: § Dialogue-on-track indicators § Counts, averages of non-understanding indicators § Rate of dialogue advance 20 problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 21 problem : approach : infrastructure : indicators : strategies : decision process : summary

Error Recovery Strategies § Identify and define an extended set of error handling strategies § Implement § Construct task-decoupled implementations of a large number of strategies § Evaluate performance and bring further refinements 22 problem : approach : infrastructure : indicators : strategies : decision process : summary

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system Where are we? has reliable information Start over Scratch concept value(misunderstandings) Go back Explicit confirmation Channel establishment Implicit confirmation Suspend/Resume Disambiguation Repeat Ask repeat concept Summarize Reject concept Quit 23 Ensure that the dialogue on track Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Start over Notify non-understanding Explicit confirm turn Terminate session / Direct to operator Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say problem : approach : infrastructure : indicators : strategies : decision process : summary

List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system Where are we? has reliable information Start over Scratch concept value(misunderstandings) Go back Explicit confirmation Channel establishment Implicit confirmation Suspend/Resume Disambiguation Repeat Ask repeat concept Summarize Reject concept Quit 24 Ensure that the dialogue on track Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Start over Notify non-understanding Explicit confirm turn Terminate session / Direct to operator Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say problem : approach : infrastructure : indicators : strategies : decision process : summary

Error Recovery Strategies: Evaluation § Reusability § Deploy in different spoken dialogue systems § Efficiency of non-understanding strategies § Simple metric: Is the next utterance understood? § Efficiency depends on decision process § Construct upper and lower bounds for efficiency n n 25 Lower bound: decision process which chooses uniformly Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 26 problem : approach : infrastructure : indicators : strategies : decision process : summary

Previous Reinforcement Learning Work § Dialogue control ~ Markov Decision Process § States § Actions § Rewards § R A S 1 S 2 S 3 Previous work: successes in small domains § NJFun [Singh, Kearns, Litman, Walker et al] § Problems § Approach does not scale § Once learned, policies are not reusable 27 problem : approach : infrastructure : indicators : strategies : decision process : summary

Proposed Approach Overcome previous shortcomings: 1. Focus learning only on error handling § § § 2. Use a “divide-and-conquer” approach § 28 Reduces the size of the learning problem Favors reusability of learned policies Lessens the system development effort Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary

Decision Process Architecture Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP Gating Mechanism Topic-MDP user_name No Action Concept-MDP Explicit Confirm No Action § § § 29 Small-size models Parameters can be tied across models Accommodate dynamic task generation § Favors reusability of policies § Initial policies can be easily handcrafted £ Independence assumption problem : approach : infrastructure : indicators : strategies : decision process : summary

Reward Structure & Learning Global, post-gate rewards Local rewards Reward Action Gating Mechanism Reward MDP § § 30 MDP Rewards based on any dialogue performance metric Atypical, multi-agent reinforcement learning setting Reward MDP § § Reward MDP Multiple, standard RL problems Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary

Evaluation § Performance § Compare learned policies with initial heuristic § policies Metrics n n § Task completion Efficiency Number and lengths of error segments User satisfaction Scalability § Deploy in a system operating with a sizable task § Theoretical analysis 31 problem : approach : infrastructure : indicators : strategies : decision process : summary

Outline § § § 32 Problem Approach Infrastructure Research Program Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

Summary of Anticipated Contributions § Goal: develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems § Modern dialogue management framework § Belief updating framework § Investigation of an extended set of error handling § 33 strategies Scalable data-driven approach for learning error handling policies problem : approach : infrastructure : indicators : strategies : decision process : summary

Timeline now data indicators February 2004 end of year 4 September 2004 Misunderstanding and non-understanding strategies Data collection for belief updating and WOZ study Develop and evaluate the belief updating models January 2005 Data collection for RL training end of year 5 Data collection for RL evaluation September 2005 Contingency data collection efforts December 2005 strategies Evaluate non-understanding strategies; develop remaining strategies decisions Investigate theoretical aspects of proposed reinforcement learning model Implement dialogue-on-track indicators proposal milestone 1 milestone 2 Error handling decision process: reinforcement learning experiments milestone 3 Additional experiments: extensions or contingency work defense 5. 5 years 34 problem : approach : infrastructure : indicators : strategies : decision process : summary

Thank You! Questions & Comments 35

Additional Slides 36

Errors in spoken dialogue systems System acquires correct information OK System acquires information Recognition Parsing System acquires incorrect information Misunderstanding Contextual Interpretation Understanding Process System does not acquire information Non-understanding indicators/ Turn-level strategies 38 Belief Updating/ Concept-level strategies

Structure of Individual MDPs § Concept MDPs § § State-space: belief indicators Action-space: concept scoped system actions Expl. Conf Impl. Conf LC Expl. Conf Impl. Conf MC No. Act HC No. Act 0 § Topic MDPs § § 39 State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions

Gating Mechanism § Heuristic derived from domain-independent dialogue principles § Give priority to entities closer to the conversational § 40 focus Give priority to topics over concept

Task-independence / Reusability § Argument: architecure Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome Greet. User Ask. Registered registered Date. Time Location Ask. Name Network user_name Properties Projector Whiteboard Dialogue Task (Specification) Domain-Independent Dialogue Engine Error Indicators Error Handling Decision Process Explicit. Confirm Ask. Registered Login Strategies Room. Line Dialogue Stack § 41 registered: [No]-> false, [Yes] -> true user_name: [User. Name] query. date_time: [Date. Time] query. location: [Location] query. network: [Network] Expectation Agenda Proof: deployment across multiple Raven. Claw systems problem : approach : infrastructure : indicators : strategies : decision process : summary

Adaptable § Argument: reinforcement learning approach Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action § 42 Proof: longer term evaluation of adaptability (extension work item) problem : approach : infrastructure : indicators : strategies : decision process : summary

Scalable § Argument: architecture Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action § 43 Proof: deployment and experiments with systems with large tasks problem : approach : infrastructure : indicators : strategies : decision process : summary

Scalability of Reinforcement Learning § NJFun § 3 concepts, 7 state variables, 62 states § Learned a policy from 311 dialogues § Consider § 12 concepts (Room. Line/20, CMU Let’s Go!/27) § 242 states § State-space: grows 4 times § # Parameters: grows 16 times 44

Extension Work Items § Portability of confidence annotation and belief updating schemes § Use domain-independent features § Train in one domain, test in another § Use of unlabeled data n § Evaluation of adaptability/reusability § Reusability: migrate policies from one domain to § 45 Self-training and co-training to improve performance in a new domain another Adaptability: monitor model changes and system behavior throughout a period of extended use

Study by [Shin et al] § Labeled error segments in 141 dialogs from multiple Communicator systems § 1. 66 error segments / session § 22% of the error segments never get back on track § 37% of sessions have breakdowns in interaction 46

Errors Indicators, Strategies & Decision Process § § § Error indicators Strategies Decision Process Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action 47

An example S: Where do you want to go to? U: I wanna go to Grand Rapids Michigan on December first [I WANNA GO TO GRAND RAPIDS MICHIGAN FOLLOWING DECEMBER BOSTON] S: traveling to Boston… departing Pittsburgh on what date? U: traveling to Grand Rapids Michigan [TRAVELING TO GRAND RAPIDS MICHIGAN] S: traveling to Grand Rapids… on what date would you like to travel? U: December first [DECEMBER FOR SHIT] S: sorry, I’m not sure I understood what you said. What day did you need to travel? U: Friday December first [FRIDAY DECEMBER FOURTH] S: a flight on Monday, December fourth… and what time would you like to leave Pittsburgh Monday, December fourth U: traveling on Friday December first [COMING FRIDAY DECEMBER] S: the date you gave was ambiguous. Did you mean November twenty-third or November twenty fourth? U: traveling on Friday December first [TRAVELING ON FRIDAY DECEMBER] S: a flight on Friday, November twenty-fourth… and departing Pittsburgh at what time? U: traveling the following week on Friday [I’M TRAVELING FOLLOWING WEEK ON FRIDAY] 48