LING 138238 SYMBSYS 138 Intro to Computer Speech
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Lecture 3: October 5, 2004 Dan Jurafsky 10/15/2021 LING 138/238 Autumn 2004 1
Week 2: Dialogue and Conversational Agents • Examples of spoken language systems • Components of a dialogue system, focus on thes 3: – ASR – NLU – Dialogue management • Voice. XML • Grounding and Confirmation 10/15/2021 LING 138/238 Autumn 2004 2
Conversational Agents • AKA: – Spoken Language Systems – Dialogue Systems – Speech Dialogue Systems • Applications: – – – Travel arrangements (Amtrak, United airlines) Telephone call routing Tutoring Communicating with robots Anything with limited screen/keyboard 10/15/2021 LING 138/238 Autumn 2004 3
A travel dialog: Communicator 10/15/2021 LING 138/238 Autumn 2004 4
Call routing: ATT HMIHY 10/15/2021 LING 138/238 Autumn 2004 5
A tutorial dialogue: ITSPOKE 10/15/2021 LING 138/238 Autumn 2004 6
Dialogue System Architecture • Simplest possible architecture: ELIZA • Read-search/replace-print loop • We’ll need something with more sophisticated dialogue control • And speech 10/15/2021 LING 138/238 Autumn 2004 7
Dialogue System Architecture 10/15/2021 LING 138/238 Autumn 2004 8
ASR engine • ASR = Automatic Speech Recognition • Job of ASR system is to go from speech (telephone or microphone) to words • We will be studying this in a few weeks 10/15/2021 LING 138/238 Autumn 2004 9
ASR Overview 10/15/2021 (pic from Yook 2003) LING 138/238 Autumn 2004 10
ASR in Dialogue Systems • ASR systems work better if can constrain what words the speaker is likely to say. • A dialogue system often has these constraints: – System: What city are you departing from? – Can expect sentences of the form • • 10/15/2021 I want to (leave|depart) from [CITYNAME] From [CITYNAME] etc LING 138/238 Autumn 2004 11
ASR in Dialogue Systems • Also, can adapt to speaker • But!! ASR is errorful • So unlike ELIZA, can’t count on the words being correct • As we will see, this fact about error plays a huge role in dialogue system design 10/15/2021 LING 138/238 Autumn 2004 12
Natural Language Understanding • Also called NLU • We will discuss this later in the quarter • There are many ways to represent the meaning of sentences • For speech dialogue systems, perhaps the most common is a simple one called “Frame and slot semantics”. • Semantics = meaning 10/15/2021 LING 138/238 Autumn 2004 13
An example of a frame • Show me morning flights from Boston to SF on Tuesday. SHOW: FLIGHTS: ORIGIN: CITY: Boston DATE: Tuesday TIME: morning DEST: CITY: San Francisco 10/15/2021 LING 138/238 Autumn 2004 14
How to generate this semantics? • Many methods, as we will see in week 9 • Simplest: semantic grammars – LIST -> show me | I want | can I see|… – DEPARTTIME -> (after|around|before) HOUR | morning | afternoon | evening – HOUR -> one|two|three…|twelve (am|pm) – FLIGHTS -> (a) flight|flights – ORIGIN -> from CITY – DESTINATION -> to CITY – CITY -> Boston | San Francisco | Denver | Washington 10/15/2021 LING 138/238 Autumn 2004 15
Semantics for a sentence • • • LIST FLIGHTS ORIGIN Show me flights from Boston DESTINATION DEPARTDATE to San Francisco on Tuesday DEPARTTIME morning 10/15/2021 LING 138/238 Autumn 2004 16
Frame-filling • We use a parser (week 10) to take these rules and apply them to the sentence. • Resulting in a semantics for the sentence • We can then write some simple code • That takes the semantically labeled sentence • And fills in the frame. 10/15/2021 LING 138/238 Autumn 2004 17
Other NLU Approaches • Cascade of Finite-State-Transducers • Instead of a parser, we could use FSTs, which are very fast, to create the semantics. • Or we could use “Syntactic rules with semantic attachments” • This latter is what is done in Voice. XML, so we will see that today. 10/15/2021 LING 138/238 Autumn 2004 18
Generation and TTS • Won’t say much about this today • TTS next week! • Generation: two main approaches – Simple templates (prescripted sentences) – Unification: use similar grammar rules as for parsing, but run them backwards! 10/15/2021 LING 138/238 Autumn 2004 19
Dialogue Manager • Eliza was simplest dialogue manager – Read-search/replace-print loop • No state was kept; system did the same thing on every sentence • A real dialogue manager needs to keep state • We can’t keep asking the same question over and over! 10/15/2021 LING 138/238 Autumn 2004 20
Three architectures for dialogue management • Finite State • Frame-based • Planning Agents 10/15/2021 LING 138/238 Autumn 2004 21
Finite State Dialogue Manager 10/15/2021 LING 138/238 Autumn 2004 22
Finite-state dialogue managers • System completely controls the conversation with the user. • It asks the user a series of question • Ignoring (or misinterpreting) anything the user says that is not a direct answer to the system’s questions 10/15/2021 LING 138/238 Autumn 2004 23
Dialogue Initiative • “Initiative” means who has control of the conversation at any point – Single initiative • System • User – Mixed initative 10/15/2021 LING 138/238 Autumn 2004 24
System Initiative • Systems which completely control the conversation at all times are called system initiative. • Advantages: – Simple to build – User always knows what they can say next – System always knows what user can say next • Known words: Better performance from ASR • Known topic: Better performance from NLU • Disadvantage: – Too limited 10/15/2021 LING 138/238 Autumn 2004 25
User Initiative • User directs the system • Generally, user asks a single question, system answers • System can’t ask questions back, engage in clarification dialogue, confirmation dialogue • Used for simple database queries • User asks question, system gives answer • Web search is user initiative dialogue. 10/15/2021 LING 138/238 Autumn 2004 26
Problems with System Initiative • Real dialogue involves give and take! • In travel planning, users might want to say something that is not the direct answer to the question. • For example answering more than one question in a sentence: – Hi, I’d like to fly from Seattle Tuesday morning – I want a flight from Milwaukee to Orlando one way leaving after 5 p. m. on Wednesday. 10/15/2021 LING 138/238 Autumn 2004 27
Single initiative + universals • We can give users a little more flexibility by adding universal commands • Universals: commands you can say anywhere • As if we augmented every state of FSA with these – Help – Correct • This describes many implemented systems • But still doesn’t deal with mixed initiative 10/15/2021 LING 138/238 Autumn 2004 28
Mixed Initiative • Conversational initiative can shift between system and user • Simplest kind of mixed initiative: use the structure of the frame itself to guide dialogue – – – Slot ORIGIN DEST DEPT DATE DEPT TIME AIRLINE 10/15/2021 Question What city are you leaving from? Where are you going? What day would you like to leave? What time would you like to leave? What is your preferred airline? LING 138/238 Autumn 2004 29
Frames are mixed-initiative • User can answer multiple questions at once. • System asks questions of user, filling any slots that user specifies • When frame is filled, do database query • If user answers 3 questions at once, system has to fill slots and not ask these questions again! • Anyhow, we avoid the strict constraints on order of the finite-state architecture. 10/15/2021 LING 138/238 Autumn 2004 30
Multiple frames • flights, hotels, rental cars • Flight legs: Each flight can have multiple legs, which might need to be discussed separately • Presenting the flights (If there are multiple flights meeting users constraints) – It has slots like 1 ST_FLIGHT or 2 ND_FLIGHT so use can ask “how much is the second one” • General route information: – Which airlines fly from Boston to San Francisco • Airfare practices: – Do I have to stay over Saturday to get a decent airfare? 10/15/2021 LING 138/238 Autumn 2004 31
Multiple Frames • Need to be able to switch from frame to frame • Based on what user says. • Disambiguate which slot of which frame an input is supposed to fill, then switch dialogue control to that frame. 10/15/2021 LING 138/238 Autumn 2004 32
Voice. XML • • Voice e. Xtensible Markup Language An XML-based dialogue design language Makes use of ASR and TTS Deals well with simple, frame-based mixed initiative dialogue. • Most common in commercial world (too limited for research systems) • But useful to get a handle on the concepts. 10/15/2021 LING 138/238 Autumn 2004 33
Voice XML • Each dialogue is a <form>. (Form is the Voice. XML word for frame) • Each <form> generally consists of a sequence of <field>s, with other commands 10/15/2021 LING 138/238 Autumn 2004 34
Sample vxml doc <form> <field name="transporttype"> <prompt> Please choose airline, hotel, or rental car. </prompt> <grammar type="application/x=nuance-gsl"> [airline hotel "rental car"] </grammar> </field> <block> <prompt> You have chosen <value expr="transporttype">. </prompt> </block> </form> 10/15/2021 LING 138/238 Autumn 2004 35
Voice. XML interpreter • Walks through a VXML form in document order • Iteratively selecting each item • If multiple fields, visit each one in order. • Special commands for events 10/15/2021 LING 138/238 Autumn 2004 36
Another vxml doc (1) noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> 10/15/2021 LING 138/238 Autumn 2004 37
Another vxml doc (2) <form> <block> Welcome to the air travel consultant. </block> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> 10/15/2021 LING 138/238 Autumn 2004 38
Another vxml doc (3) <field name="destination"> <prompt> And which city do you want to go to? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, to <value expr="destination"> </prompt> </filled> </field> <field name="departdate" type="date"> <prompt> And what date do you want to leave? </prompt> <filled> <prompt> OK, on <value expr="departdate"> </prompt> </filled> </field> 10/15/2021 LING 138/238 Autumn 2004 39
Another vxml doc (4) <block> <prompt> OK, I have you are departing from <value expr="origin”> to <value expr="destination”> o <value expr="departdate"> </prompt> send the info to book a flight. . . </block> </form> 10/15/2021 LING 138/238 Autumn 2004 40
A mixed initiative VXML doc • Mixed initiative: user might answer a different question • So Voice. XML interpreter can’t just evaluate each field of form in order • User might answer field 2 when system asked field 1 • So need grammar which can handle all sorts of input: – – Field 1 Field 2 Field 1 and field 2 etc 10/15/2021 LING 138/238 Autumn 2004 41
VXML Nuance-style grammars • Rewrite rules – Wantsentence -> I want to (fly|go) • Nuance VXML format is: – – () for concatenation, [] for disjunction Each rule has a name: Wantsentence (I want to [fly go]) Airports [(san francisco) denver] 10/15/2021 LING 138/238 Autumn 2004 42
Mixed-init VXML example (3) <noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> <form> <grammar type="application/x=nuance-gsl"> <![ CDATA[ • 10/15/2021 LING 138/238 Autumn 2004 43
Grammar Flight ( ? [ ] [ ) (i [wanna (want to)] [fly go]) (i'd like to [fly go]) ([(i wanna)(i'd like a)] flight) ( [from leaving departing] City: x) {<origin $x>} ( [(? going to)(arriving in)] City: x) {<dest $x>} ( [from leaving departing] City: x [(? going to)(arriving in)] City: y) {<origin $x> <dest $y>} ] ? please 10/15/2021 LING 138/238 Autumn 2004 44
Grammar City [ [(san francisco) (s f o)] {return( "san francisco, california")} [(denver) (d e n)] {return( "denver, colorado")} [(seattle) (s t x)] {return( "seattle, washington")} ] ]]> </grammar> 10/15/2021 LING 138/238 Autumn 2004 45
Grammar <initial name="init"> <prompt> Welcome to the air travel consultant. What are your travel plans? </prompt> </initial> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> 10/15/2021 LING 138/238 Autumn 2004 46
Grammar <field name="dest"> <prompt> And which city do you want to go to? </prompt> <filled> <prompt> OK, to <value expr="dest"> </prompt> </filled> </field> <block> <prompt> OK, I have you are departing from <value expr="origin"> to <value expr="dest">. </prompt> send the info to book a flight. . . </block> </form> 10/15/2021 LING 138/238 Autumn 2004 47
Grounding and Confirmation • Dialogue is a collective act performed by speaker and hearer • Common ground: set of things mutually believed by both speaker and hearer • Need to achieve common ground, so hearer must ground or acknowledge speakers utterance. • Clark (1996): – Principle of closure. Agents performing an action require evidence, sufficient for current purposes, that they have succeeded in performing it 10/15/2021 LING 138/238 Autumn 2004 48
Clark and Schaefer: Grounding • Continued attention: B continues attending to A • Relevant next contribution: B starts in on next relevant contribution • Acknowledgement: B nods or says continuer like uh -huh, yeah, assessment (great!) • Demonstration: B demonstrates understanding A by paraphrasing or reformulating A’s contribution, or by collaboratively completing A’s utterance • Display: B displays verbatim all or part of A’s presentation 10/15/2021 LING 138/238 Autumn 2004 49
10/15/2021 LING 138/238 Autumn 2004 50
Grounding examples • Display: – C: I need to travel in May – A: And, what day in May did you want to travel? • Acknowledgement – C: He wants to fly from Boston – A: mm-hmm – C: to Baltimore Washington International 10/15/2021 LING 138/238 Autumn 2004 51
Grounding Examples (2) • Acknowledgement + next relevant contribution – And, what day in May did you want to travel? – And you’re flying into what city? – And what time would you like to leave? 10/15/2021 LING 138/238 Autumn 2004 52
Grounding and Dialogue Systems • Grounding is not just a tidbit about humans • Is key to design of conversational agent • Why? 10/15/2021 LING 138/238 Autumn 2004 53
Grounding and Dialogue Systems • Grounding is not just a tidbit about humans • Is key to design of conversational agent • Why? – HCI researchers find users of speech-based interfaces are confused when system doesn’t give them an explicit acknowedgement signal – Experiment with this 10/15/2021 LING 138/238 Autumn 2004 54
Confirmation • • • Another reason for grounding Speech is a pretty errorful channel Hearer could misinterpret the speaker This is important in Conv. Agents Since we are using ASR, which is still really buggy. • So we need to do lots of grounding and confirmation 10/15/2021 LING 138/238 Autumn 2004 55
Explicit confirmation • • S: U: 10/15/2021 Which city do you want to leave from? Baltimore Do you want to leave from Baltimore? Yes LING 138/238 Autumn 2004 56
Explicit confirmation • U: I’d like to fly from Denver Colorado to New York City on September 21 st in the morning on United Airlines • S: Let’s see then. I have you going from Denver Colorado to New York on September 21 st. Is that correct? • U: Yes 10/15/2021 LING 138/238 Autumn 2004 57
Implicit confirmation: display • U: I’d like to travel to Berlin • S: When do you want to travel to Berlin? • U: Hi I’d like to fly to Seattle Tuesday morning • S: Traveling to Seattle on Tuesday, August eleventh in the morning. Your name? 10/15/2021 LING 138/238 Autumn 2004 58
Implicit vs. Explicit • Complementary strengths • Explicit: easier for users to correct systems’s mistakes (can just say “no”) • But explicit is cumbersome and long • Implicit: much more natural, quicker, simpler (if system guesses right). 10/15/2021 LING 138/238 Autumn 2004 59
Implicit and Explicit • Early systems: all-implicit or all-explicit • Modern systems: adaptive • How to decide? – ASR system can give confidence metric. – This expresses how convinced system is of its transcription of the speech – If high confidence, use implicit confirmation – If low confidence, use explicit confirmation 10/15/2021 LING 138/238 Autumn 2004 60
Next Lecture • • Dialogue acts More on VXML More on design of dialogue agents Evaluation of dialogue agents • Don’t forget to look at the homework early!!!! 10/15/2021 LING 138/238 Autumn 2004 61
- Slides: 61