Error Awareness and Recovery in TaskOriented Spoken Dialogue
- Slides: 48
Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research)
Problem Lack of robustness when faced with understanding errors § Spans most domains and interaction types § Has a significant impact on performance 2
An example S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1: 40 pm, arrives Seoul at 5 pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1: 40 pm arrives Seoul at ……… 3
Some Statistics … § § Semantic error rates: ~25 -35% Speech. Acts [SRI] 25% CU Communicator [CU] 27% Jupiter [MIT] 28% CMU Communicator [CMU] 32% How May I Help You? [AT&T] 36% Corrections [Krahmer, Swerts, Litman, Levow] § 30% of utterances correct system mistakes § 2 -3 times more likely to be misrecognized 4
Significant Impact on Interaction § CMU Communicator Failed 26% 40% 33% sessions Contain understanding errors § Multi-site Communicator Corpus [Shin et al] Failed 37% 5 63% sessions
Outline § Problem Ø Approach § Infrastructure § Research Program § Summary & Timeline 6 problem : approach : infrastructure : indicators : strategies : decision process : summary
Increasing Robustness … 7 § Increase the accuracy of speech recognition § Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level problem : approach : infrastructure : indicators : strategies : decision process : summary
Snapshot of Existing Work: Slide 1 § Theoretical models of grounding § Contribution Model [Clark], Grounding Acts [Traum] Analytical/Descriptive, not decision oriented § Practice: heuristic rules § Misunderstandings n Threshold(s) on confidence scores § Non-understandings Ad-hoc, lack generality, not easy to extend 8 problem : approach : infrastructure : indicators : strategies : decision process : summary
Snapshot of Existing Work: Slide 2 § Conversation as Action under Uncertainty [Paek and Horvitz] § Belief networks to model uncertainties § Decisions based on expected utility, VOI-analysis § Reinforcement learning for dialogue control policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc] § Formulate dialogue control as an MDP § Learn optimal control policy from data Do not scale up to complex, real-world tasks 9 problem : approach : infrastructure : indicators : strategies : decision process : summary
Thesis Statement Develop a task-independent, adaptive and scalable framework for error recovery in taskoriented spoken dialogue systems Approach: § 10 Decision making under uncertainty problem : approach : infrastructure : indicators : strategies : decision process : summary
Three components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 11 problem : approach : infrastructure : indicators : strategies : decision process : summary
Infrastructure § Completed Raven. Claw § Modern dialog management framework for complex, task-oriented domains § Completed 12 Raven. Claw spoken dialogue systems § Test-bed for evaluation problem : approach : infrastructure : indicators : strategies : decision process : summary
Raven. Claw Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome Greet. User Ask. Registered registered Date. Time Location Ask. Name Network user_name Properties Projector Whiteboard Dialogue Task (Specification) Domain-Independent Dialogue Engine registered: [No]-> false, [Yes] -> true Error Indicators Error Handling Decision Process Explicit. Confirm Ask. Registered Login Strategies Room. Line Dialogue Stack 13 registered: [No]-> false, [Yes] -> true user_name: [User. Name] query. date_time: [Date. Time] query. location: [Location] query. network: [Network] Expectation Agenda problem : approach : infrastructure : indicators : strategies : decision process : summary
Raven. Claw-based Systems 14 System Domain Room. Line Information Access CMU Let’s Go! Bus Information System Information Access LARRI [Symphony] Guidance through procedures Intelligent Procedure Assistant [NASA Ames] Guidance through procedures Team. Talk [11 -754] Command-control Eureka [11 -743] Web-access problem : approach : infrastructure : indicators : strategies : decision process : summary
Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 15 problem : approach : infrastructure : indicators : strategies : decision process : summary
Existing Work § Confidence Annotation § Traditionally focused on speech recognizer [Bansal, Chase, Cox, and others] § Recently, multiple sources of knowledge [San-Segundo, Walker, Bosch, Bohus, and others] n Recognition, parsing, dialogue management § Detect misunderstandings: ~ 80 -90% accuracy § Correction and Aware Site Detection [Swerts, Litman, Levow and others] § Multiple sources of knowledge § Detect corrections: ~ 80 -90% accuracy 16 problem : approach : infrastructure : indicators : strategies : decision process : summary
Proposed: Belief Updating § Continuously assess beliefs in light of initial confidence and subsequent events § An example: S: Where are you flying from? U: [City. Name={Aspen/0. 6; Austin/0. 2}] S: Did you say you wanted to fly out of Aspen? U: [No/0. 6] [City. Name={Boston/0. 8}] [City. Name={Aspen/? ; Austin/? ; Boston/? }] 17 initial belief + system action + user response updated belief problem : approach : infrastructure : indicators : strategies : decision process : summary
Belief Updating: Approach § Model the update in a dynamic belief network User concept t User concept System action t+1 User response contents 1 st Hyp confidence 2 nd Hyp Confidence Yes correction 18 3 rd Hyp Utterance Length No Positive Markers Negative Markers problem : approach : infrastructure : indicators : strategies : decision process : summary
Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 19 problem : approach : infrastructure : indicators : strategies : decision process : summary
Is the Dialogue Advancing Normally? Locally, turn-level: § Non-understanding indicators § Non-understanding flag directly available § Develop additional indicators n Recognition, Understanding, Interpretation Globally, discourse-level: § Dialogue-on-track indicators § Counts, averages of non-understanding indicators § Rate of dialogue advance 20 problem : approach : infrastructure : indicators : strategies : decision process : summary
Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 21 problem : approach : infrastructure : indicators : strategies : decision process : summary
Error Recovery Strategies § Identify and define an extended set of error handling strategies § Implement § Construct task-decoupled implementations of a large number of strategies § Evaluate performance and bring further refinements 22 problem : approach : infrastructure : indicators : strategies : decision process : summary
List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system Where are we? has reliable information Start over Scratch concept value(misunderstandings) Go back Explicit confirmation Channel establishment Implicit confirmation Suspend/Resume Disambiguation Repeat Ask repeat concept Summarize Reject concept Quit 23 Ensure that the dialogue on track Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Start over Notify non-understanding Explicit confirm turn Terminate session / Direct to operator Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say problem : approach : infrastructure : indicators : strategies : decision process : summary
List of Error Recovery Strategies User Initiated System Initiated Help Ensure that the system Where are we? has reliable information Start over Scratch concept value(misunderstandings) Go back Explicit confirmation Channel establishment Implicit confirmation Suspend/Resume Disambiguation Repeat Ask repeat concept Summarize Reject concept Quit 24 Ensure that the dialogue on track Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Start over Notify non-understanding Explicit confirm turn Terminate session / Direct to operator Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say problem : approach : infrastructure : indicators : strategies : decision process : summary
Error Recovery Strategies: Evaluation § Reusability § Deploy in different spoken dialogue systems § Efficiency of non-understanding strategies § Simple metric: Is the next utterance understood? § Efficiency depends on decision process § Construct upper and lower bounds for efficiency n n 25 Lower bound: decision process which chooses uniformly Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary
Research Plan 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based architecture for making error handling decisions 26 problem : approach : infrastructure : indicators : strategies : decision process : summary
Previous Reinforcement Learning Work § Dialogue control ~ Markov Decision Process § States § Actions § Rewards § R A S 1 S 2 S 3 Previous work: successes in small domains § NJFun [Singh, Kearns, Litman, Walker et al] § Problems § Approach does not scale § Once learned, policies are not reusable 27 problem : approach : infrastructure : indicators : strategies : decision process : summary
Proposed Approach Overcome previous shortcomings: 1. Focus learning only on error handling § § § 2. Use a “divide-and-conquer” approach § 28 Reduces the size of the learning problem Favors reusability of learned policies Lessens the system development effort Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary
Decision Process Architecture Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP Gating Mechanism Topic-MDP user_name No Action Concept-MDP Explicit Confirm No Action § § § 29 Small-size models Parameters can be tied across models Accommodate dynamic task generation § Favors reusability of policies § Initial policies can be easily handcrafted £ Independence assumption problem : approach : infrastructure : indicators : strategies : decision process : summary
Reward Structure & Learning Global, post-gate rewards Local rewards Reward Action Gating Mechanism Reward MDP § § 30 MDP Rewards based on any dialogue performance metric Atypical, multi-agent reinforcement learning setting Reward MDP § § Reward MDP Multiple, standard RL problems Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary
Evaluation § Performance § Compare learned policies with initial heuristic § policies Metrics n n § Task completion Efficiency Number and lengths of error segments User satisfaction Scalability § Deploy in a system operating with a sizable task § Theoretical analysis 31 problem : approach : infrastructure : indicators : strategies : decision process : summary
Outline § § § 32 Problem Approach Infrastructure Research Program Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary
Summary of Anticipated Contributions § Goal: develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems § Modern dialogue management framework § Belief updating framework § Investigation of an extended set of error handling § 33 strategies Scalable data-driven approach for learning error handling policies problem : approach : infrastructure : indicators : strategies : decision process : summary
Timeline now data indicators February 2004 end of year 4 September 2004 Misunderstanding and non-understanding strategies Data collection for belief updating and WOZ study Develop and evaluate the belief updating models January 2005 Data collection for RL training end of year 5 Data collection for RL evaluation September 2005 Contingency data collection efforts December 2005 strategies Evaluate non-understanding strategies; develop remaining strategies decisions Investigate theoretical aspects of proposed reinforcement learning model Implement dialogue-on-track indicators proposal milestone 1 milestone 2 Error handling decision process: reinforcement learning experiments milestone 3 Additional experiments: extensions or contingency work defense 5. 5 years 34 problem : approach : infrastructure : indicators : strategies : decision process : summary
Thank You! Questions & Comments 35
Additional Slides 36
37
Errors in spoken dialogue systems System acquires correct information OK System acquires information Recognition Parsing System acquires incorrect information Misunderstanding Contextual Interpretation Understanding Process System does not acquire information Non-understanding indicators/ Turn-level strategies 38 Belief Updating/ Concept-level strategies
Structure of Individual MDPs § Concept MDPs § § State-space: belief indicators Action-space: concept scoped system actions Expl. Conf Impl. Conf LC Expl. Conf Impl. Conf MC No. Act HC No. Act 0 § Topic MDPs § § 39 State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions
Gating Mechanism § Heuristic derived from domain-independent dialogue principles § Give priority to entities closer to the conversational § 40 focus Give priority to topics over concept
Task-independence / Reusability § Argument: architecure Room. Line query Get. Query Login Get. Results Discuss. Results results Welcome Greet. User Ask. Registered registered Date. Time Location Ask. Name Network user_name Properties Projector Whiteboard Dialogue Task (Specification) Domain-Independent Dialogue Engine Error Indicators Error Handling Decision Process Explicit. Confirm Ask. Registered Login Strategies Room. Line Dialogue Stack § 41 registered: [No]-> false, [Yes] -> true user_name: [User. Name] query. date_time: [Date. Time] query. location: [Location] query. network: [Network] Expectation Agenda Proof: deployment across multiple Raven. Claw systems problem : approach : infrastructure : indicators : strategies : decision process : summary
Adaptable § Argument: reinforcement learning approach Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action § 42 Proof: longer term evaluation of adaptability (extension work item) problem : approach : infrastructure : indicators : strategies : decision process : summary
Scalable § Argument: architecture Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action § 43 Proof: deployment and experiments with systems with large tasks problem : approach : infrastructure : indicators : strategies : decision process : summary
Scalability of Reinforcement Learning § NJFun § 3 concepts, 7 state variables, 62 states § Learned a policy from 311 dialogues § Consider § 12 concepts (Room. Line/20, CMU Let’s Go!/27) § 242 states § State-space: grows 4 times § # Parameters: grows 16 times 44
Extension Work Items § Portability of confidence annotation and belief updating schemes § Use domain-independent features § Train in one domain, test in another § Use of unlabeled data n § Evaluation of adaptability/reusability § Reusability: migrate policies from one domain to § 45 Self-training and co-training to improve performance in a new domain another Adaptability: monitor model changes and system behavior throughout a period of extended use
Study by [Shin et al] § Labeled error segments in 141 dialogs from multiple Communicator systems § 1. 66 error segments / session § 22% of the error segments never get back on track § 37% of sessions have breakdowns in interaction 46
Errors Indicators, Strategies & Decision Process § § § Error indicators Strategies Decision Process Topic-MDP No Action Room. Line Explicit Confirmation Topic-MDP Login No Action Welcome Greet. User Ask. Registered Ask. Name registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action 47
An example S: Where do you want to go to? U: I wanna go to Grand Rapids Michigan on December first [I WANNA GO TO GRAND RAPIDS MICHIGAN FOLLOWING DECEMBER BOSTON] S: traveling to Boston… departing Pittsburgh on what date? U: traveling to Grand Rapids Michigan [TRAVELING TO GRAND RAPIDS MICHIGAN] S: traveling to Grand Rapids… on what date would you like to travel? U: December first [DECEMBER FOR SHIT] S: sorry, I’m not sure I understood what you said. What day did you need to travel? U: Friday December first [FRIDAY DECEMBER FOURTH] S: a flight on Monday, December fourth… and what time would you like to leave Pittsburgh Monday, December fourth U: traveling on Friday December first [COMING FRIDAY DECEMBER] S: the date you gave was ambiguous. Did you mean November twenty-third or November twenty fourth? U: traveling on Friday December first [TRAVELING ON FRIDAY DECEMBER] S: a flight on Friday, November twenty-fourth… and departing Pittsburgh at what time? U: traveling the following week on Friday [I’M TRAVELING FOLLOWING WEEK ON FRIDAY] 48
- Privacy awareness and hipaa privacy training cvs
- Congrats dialogue
- Type 1 type 2 error power
- How to avoid parallax error
- Round off and truncation error
- Difference between error detection and error correction
- Difference between absolute and relative errors
- Error recovery in top down parsing
- Contoh kesalahan leksikal
- Type 2 vs type 1 error
- Can dead men vote twice
- Example of hypothesis
- Power series form
- Error sistematico
- Error sistematico y error aleatorio
- Error sistematico
- Error absolut i error relatiu
- Invertery
- During error reporting, icmp always reports error messages
- Spoken english and broken english
- Romeo and juliet poem
- Akkudativ
- Display
- This can be spoken and written messages
- Types of spoken discourse
- Examples of written language
- The gap between written and spoken english
- Ciarajuliet
- In a language the smallest distinctive sound unit
- Spoken language audio retrieval in irs
- Language vietnamese
- Spoken language features
- Spoken word poetry allows you to be anyone you want to be
- What are speaking skills
- Love in swahili
- Spoken or written description of an event
- Where was latin spoken
- Spoken language features
- What is a spoken or written account of connected events
- Ten day spoken sanskrit classes
- What language is svenska
- Poetry from different cultures
- Naxos spoken word library
- Most spoken language in the world
- Spoken bnc 2014
- Macrolinguistics branches
- Written word vs spoken word
- Most spoken language in the world
- Spoken texts examples