Belief Updating in Spoken Dialog Systems Dan Bohus
Belief Updating in Spoken Dialog Systems Dan Bohus www. cs. cmu. edu/~dbohus@cs. cmu. edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
problem spoken language interfaces lack robustness when faced with understanding errors. § stems mostly from speech recognition § spans most domains and interaction types 2
more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1: 40 pm, arrives Seoul at 5 pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1: 40 pm arrives Seoul at ……… 3
non- and misunderstandings NON understanding MIS understanding 4 S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1: 40 pm, arrives Seoul at 5 pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1: 40 pm arrives Seoul at ………
approaches for increasing robustness § fix recognition § gracefully handle errors through interaction § detect the problems § develop a set of recovery strategies § know how to choose between them (policy) 5
six not-so-easy pieces … misunderstandings detection strategies policy 6 non-understandings
belief updating misunderstandings detection § construct more accurate beliefs by integrating information over multiple turns S: Where would you like to go? U: Huntsville [SEOUL / 0. 65] destination = {seoul/0. 65} S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0. 60] destination = {? } 7
belief updating: problem statement § given: § an initial belief Pinitial(C) over destination = {seoul/0. 65} concept C S: traveling to Seoul. What day did you need to travel? § a system action SA [THE TRAVELING BERLIN P_M § a user TO response R / 0. 60] destination = {? } § construct an updated belief: § Pupdated(C) ← f (Pinitial(C), SA, R) 8
outline § § § 9 related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
confidence annotation + heuristic updates § confidence annotation § traditionally focused on word-level errors [Chase, Cox, Bansal, Ravinshankar] § more recently: semantic confidence annotation [Walker, San-Segundo, Bohus] n n machine learning approach results fairly good, but not perfect § heuristic updates § explicit confirmation: no → don’t trust ; yes → trust § implicit confirmation: no → don’t trust ; o/w → trust § suboptimal for several reasons 10 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
correction detection § detect if the user is trying to correct the system [Litman, Swerts, Hirschberg, Krahmer, Levow] § machine learning approach § features from different knowledge sources in the system § results fairly good, but not perfect 11 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
integration § confidence annotation and correction detection are useful tools § but separately, neither solves the problem § bridge together in a unified approach to accurately track beliefs 12 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline § § § 13 related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
belief updating: general form § given: § an initial belief Pinitial(C) over concept C § a system action SA § a user response R § construct an updated belief: § Pupdated(C) ← f (Pinitial(C), SA, R) 14 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
restricted version: 2 simplifications 1. compact belief § system unlikely to “hear” more than 3 or 4 values n § § single vs. multiple recognition results in our data: max = 3 values, only 6. 9% have >1 value confidence score of top hypothesis 2. updates after confirmation actions § reduced problem § 15 Conf. Topupdated(C) ← f (Conf. Topinitial(C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline § § § 16 related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
data § collected with Room. Line § a phone-based mixed-initiative spoken dialog system § conference room reservation n search and negotiation § explicit and implicit confirmations § confidence threshold model (+ some exploration) § unplanned confirmations § I found 10 implicit rooms for Friday between 1 and 3 p. m. Would like a small room or a large one? 17 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
corpus § user study § 46 participants (naïve users) § 10 scenario-based interactions each § compensated per task success § corpus § 449 sessions, 8848 user turns § orthographically transcribed § rich annotation: correct concepts, corrections, etc. 18 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline § § § 19 related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
user response types § following Krahmer and Swerts § study on Dutch train-table information system § 3 user response types § YES: yes, right, that’s right, correct, etc. § NO: no, wrong, etc. § OTHER § cross-tabulated against correctness of confirmations 20 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
user responses to explicit confirmations § from transcripts CORRECT INCORRECT YES NO Other 94% [93%] 0% [0%] 5% [7%] 1% [6%] 72% [57%] 27% [37%] [numbers in brackets from Krahmer&Swerts] ~10% § from decoded 21 YES NO Other CORRECT 87% 1% 12% INCORRECT 1% 61% 38% related work : restricted version : data : user response analysis : experiment & results : caveats & future work
other responses to explicit confirmations § ~70% users repeat the correct value § ~15% users don’t address the question § attempt to shift conversation focus CORRECT INCORRECT 22 User does not correct User corrects 1159 0 29 250 [10% of incor] [90% of incor] related work : restricted version : data : user response analysis : experiment & results : caveats & future work
user responses to implicit confirmations § Transcripts YES NO Other CORRECT 30% [0%] 7% [0%] 63% [100%] INCORRECT 6% [0%] 33% [15%] 61% [85%] [numbers in brackets from Krahmer&Swerts] § Decoded 23 YES NO Other CORRECT 28% 5% 67% INCORRECT 7% 27% 66% related work : restricted version : data : user response analysis : experiment & results : caveats & future work
ignoring errors in implicit confirmations User does not correct User corrects CORRECT 552 2 INCORRECT 118 111 [51% of incor] [49% of incor] § users correct later (40% of 118) § users interact strategically § correct only if essential ~correct later 24 ~critical 55 2 critical 14 47 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline § § § 25 related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
machine learning approach § need good probability outputs § low cross-entropy between model predictions and reality § cross-entropy = negative average log posterior § logistic regression § sample efficient § stepwise approach → feature selection § logistic model tree for each action § root splits on response-type 26 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
features. target. § initial situation § initial confidence score § concept identity, dialog state, turn number § system action § other actions performed in parallel § features of the user response § § acoustic / prosodic features lexical features grammatical features dialog-level features § target: was the value correct? 27 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
baselines § initial baseline § accuracy of system beliefs before the update § heuristic baseline § accuracy of heuristic rule currently used in the system § oracle baseline § accuracy if we knew exactly when the user is correcting the system 28 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
results: explicit confirmation Hard error (%) 29 Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
results: implicit confirmation Hard error (%) 30 Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
results: unplanned implicit confirmation Hard error (%) 31 Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
informative features § § § 32 initial confidence score prosody features barge-in expectation match repeated grammar slots concept id related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline § § § 33 related work a reduced version. approach data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
eliminate simplification 1 § current restricted version § belief = confidence score of top hypothesis § only 6. 9% of cases had more than 1 hypothesis § extend to § N hypotheses + 1 (other), where N is a small integer (2 or 3) § approach: multinomial generalized linear model § use information from multiple recognition hypotheses 34 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
eliminate simplification 2 § current restricted version § only updates following system confirmation actions § users might correct the system at any point § extend to § updates after all system actions 35 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
shameless self promotion misunderstandings detection - rejection threshold adaptation - nonu impact on performance strategies - comparative analysis of 10 recovery strategies policy 36 non-understandings [Interspeech-05] [SIGdial-05] - wizard experiment - towards learning nonu recovery policies [Sigdial-05]
shameless CMU promotion § Ananlada (Moss) Chotimongkol § automatic concept and task structure acquisition § Antoine Raux § turn-taking, conversation micro-management § Jahanzeb Sherwani § multimodal personal information management § Satanjeev Banerjee § meeting understanding § Stefanie Tomko § universal speech interface § Thomas Harris § multi-participant dialog § Do. D / Young Researchers’ Roundtable 37
thankyou! 38
a more subtle caveat § distribution of training data § confidence annotator + heuristic update rules § distribution of run-time data § confidence annotator + learned model § always a problem when interacting with the world § hopefully, distribution shift will not cause large degradation in performance § remains to validate empirically § maybe a bootstrap approach? 39
- Slides: 39