Grounding in Conversational Systems Dan Bohus January 2003
Grounding in Conversational Systems Dan Bohus January 2003 Dialogs on Dialogs Reading Group Carnegie Mellon University
Overview n Early grounding theories ¨ Discourse Contributions - Clark & Schaefer ¨ Conversational acts – Traum n A Computational Framework (Horvitz, Paek) ¨ Principles ¨ Systems n Grounding in Raven. Claw
Clark & Schaefer In discourse, humans collaborate to establish/maintain mutual ground n Discourse is structured in contributions n ¨ Contribution n : Presentation + Acceptance Grounding criterion: “A and B mutually believe that the partners have understood what A said to a criterion sufficient for the current purposes”
Clark & Schaefer (2) n Evidence of understanding: ¨ Display ¨ Demonstration ¨ Acknowledgement ¨ Initiating the next relevant contribution ¨ Continued attention n Display/Demonstration order challenged…
Clark & Schaefer (3) Infinite recursion avoided by Strength of Evidence Principle n 4 possible states of non-understading n ¨L did notice S’s utterance ¨ L notices it but does not hear it correctly ¨ L hears it correctly but does not understand it ¨ L understands it
Traum n Conversational acts, extension of speech acts theory ¨ Turn-taking ¨ Grounding n Initiate, Continue, Cancel, Req. Ack, Req. Repair, Repair ¨ Core speech acts ¨ Argumentational acts n Eliminates infinite recursion by: ack. s don’t need further ack. s
Traum (2) n Later work, the following computational model is introduced: n Finally, Brennan (& Clark) ¨ another computational formulation; ¨ studies the different types of grounding behaviors in different media
Criticisms n These models are by-and-large descriptive. ¨ Can’t be used to determine what’s the next best thing to do to achieve the grounding criterion. n n Moreover, they don’t describe quantitatively/make use of the uncertainty in contributions Are insensitive to differences in channels, content, populations, etc… Cannot be used for guidance Decision Theory to the rescue ! ! !
Decision Theory Action under uncertainty n Given a set of states S = {s}, evidence e, and a set of actions A = {a}, if: n ¨ P(s|e) – is a probabilistic model of the state conditioned on the evidence ¨ U(a, s) = the utility of taking action a when in state s. n Take action that maximizes the expected utility: ¨ EU(a|e) = S U(a, s)*p(s|e)
Conversation under Uncertainty Conversation = action under uncertainty n Example: I want to fly to Pittsburgh … n ¨ States = {grounded, not_grounded} n Unaccessible, but describable by a probabilistic model n P(g | e) = P(Pittsburgh | e) … confidence annot. ¨ Actions = {explicit_confirm, implicit_confirm, continue_dialog} ¨ Utilities: U(ec, g) < U(ic, g) < U(cd, g) n U(ec, ng) > U(ic, ng) > U(cd, ng) n
I want to fly to Pittsburgh (2) n States: ¨ Not. Grounded ¨ Grounded n (ng) (g) Actions: (ec) ¨ Implicit. Confirm (ic) ¨ Continue. Dialog (cd) ec ic ¨ Explicit. Confirm n Utilities: ¨ U(ec, g) < U(ic, g) < U(cd, g) ¨ U(ec, ng) > U(ic, ng) > U(cd, ng) cd ng t 1 t 2 g
Overview n Early grounding theories ¨ Discourse Contributions - Clark & Schaefer ¨ Conversational acts – Traum n A Computational Framework (Horvitz, Paek) ¨ Principles Ø Systems Ø Ø Ø n Deep. Listener Bayesian Receptionist (Quartet architecture) Presenter (Quartet architecture) Grounding in Raven. Claw
Deep. Listener - Domain n Domain ¨ Provides spoken command-control functionality for Look. Out ¨ Respond to offers of assistance (Yes/No) n Small domain, but illustrates the core ideas very well
Deep. Listener - States n States: 5 possible “intentions” of the user Acknowledgement n Negation n Reflection n Unrecognized Signal n No Signal n n State model P(S|E) – temporal bayesian network. ¨E = User’s Actions, Content, ASR Results and Reliability + at time -1
Deep. Listener - Actions n Actions: ¨ Execute the service ¨ Repeat ¨ Note a hesitation and try again ¨ Was that meant for me? ¨ Try to get the user’s attention ¨ Apologize for the interruption and forego the service ¨ Troubleshoot the overall dialog
Deep. Listener - Utilities n Utilities ¨ Elicited through psychological experiments ¨ Elicited through slidebars ¨ Works when you have 2, 3 grounding actions, and a clear/small state-space design, but how about when the problem gets more complex ? n Example (paper)
Bayesian Receptionist, Presenter n Bayesian Receptionist – performs the tasks of a receptionist at a MS front desk “I’m here to see Rashid” ¨ “Bathroom? ” ¨ “Beam me to 25 please” ¨ … 32 goals ¨ Presenter – command & control interface to Power. Point presentations. n Both based on Quartet architecture n
Quartet n Uses DT and BN to ensure grounding at 4 different levels: ¨ Signal ¨ Channel ¨ Intention ¨ Conversation n The actual DM task is encapsulated in the same framework at the Intention level ¨ Different domains = different intention levels
Quartet – Signal & Channel n At each level infer a distribution over possible states. Key variables: ¨ Signal level – signal identified (low/med/hi) ¨ Channel level –user’s focus of attention n Maintenance module integrates Signal & Channel levels -> Maintenance Status: ¨ Channel x Signal: No. Channel, No. Signal, Channel. But. No. Signal, Signal. But. No. Channel, Signal
Quartet – Intention Level Domain is mostly goal inference n Hierarchical decomposition on levels, where lower levels refine the goals into more specific needs n Use BN to model p(goal | e) at leach level n ¨ Psychological studies to identify key variables and utilities Visual cues n Linguistic variables; both syntactic and semantic n
Quartet – Intention Level n To move between levels, compare probability of goal to… ¨ p-progress n (above: do it) ¨ p-guess n n (above: search confirmation) (below: search more info via VOI) ¨ p-backtrack n n used on return nodes Use Value-Of-Information analysis to infer what’s the variable that should be queried next.
Comments on Intention level What is the size of the learning problem? (How many BN needed? ) How much data needed for training? n Not very clear : n ¨ how to deal with attribute/value, with rich ranges (e. g. which bus station ? ) ¨ how to deal with basically richer dialog mechanisms (beyond C&C applications) focus shifts, mixed initiative n providing help n
Quartet – Conversation Level n See image. Use Intention and Maintenance Status to infer: ¨ Grounding: diagnoses mutual understanding n Okay, Channel. Failure, Intention. Failure, Conversation. Failure ¨ Activity goal: measures if the user is engaged or not in an activity with the system n Compute expected utility for each action (utilities elicited through psychological studies)
Bayesian Receptionist, Presenter Runtime behavior (section 3) n Presenter n ¨ The Signal & Channel level allow a uniform treatment in the same framework of continuous listening ¨ Experiments show that it’s better than random, but significantly less so than humans n But then again, the experiments were not very fair, being performed only at that level (i. e. no engaging in dialog allowed)
My Research … Deal with misunderstandings n Use probabilistic modeling and decision theory to make grounding decisions (but not task decisions) n I want a room tomorrow morning (0. 73) n ¨ States: time correctly understood/not ¨ Grounding Actions: no_action, expl_conf, impl_conf, reject ¨ Utilities: try to learn them by relating the actions to an overall dialog/grounding metric
Raven. Claw: Dialog Task / Grounding Room. Line Login Room. Line Get. Query Execute. Query Bye Discuss. Results Dialog Task Grounding Level Grounding Model State/how well are things going Optimal action Strategies/Grounding Actions
States and Actions Strategies. xls n States (have to keep it small!!!) n ¨ Single “state-space” model n What are the variables? Which are observable and which are stochastically modeled? ¨ Multiple “state-space” models n First 5 strategies: state = amount of grounding on each concept n What should state be for the rest? What are the indicators? Which are fully observable and which are not? n How to combine decisions from different spaces
Utilities n Learn them! How ? ¨ Idea 1: POMDPs, maybe this small they are tractable ¨ Idea 2: Regression to some overall dialog metric n What should that be? (hmm) amount of non-null grounding actions taken ¨… ¨ ¨…
- Slides: 28