A Framework For Developing Conversational User Interfaces James







































- Slides: 39
A Framework For Developing Conversational User Interfaces James Glass, Eugene Weinstein, Scott Cyphers, Joseph Polifroni MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA USA Grace Chung Corporation for National Research Initiatives Reston, VA USA Mikio Nakano NTT Corporation Atsugi, Japan
Conversational User Interfaces Speech Human Speech Recognition Synthesis Computer Text Understanding Generation Meaning
Types of Conversational Interfaces • Conversational systems differ in the degree with which human or computer controls the conversation (initiative) Computer Initiative Human • Computer maintains tight control • Human is highly restricted • Human takes complete control • Computer is totally passive C: Please say the departure city. Directed Dialogue H: I want to visit my grandmother. Mixed Initiative Dialogue Free Form Dialogue
Conversational Interfaces • Can understand verbal input – Speech recognition – Language understanding (in context) • Can engage in dialogue with a user during the interaction • Can verbalize response – Language generation – Speech synthesis Language Generation Dialogue Management Speech Synthesis Audio Back End Speech Recognition Context Resolution Language Understanding
The Problem With Conversational Interfaces • Advanced conversational systems are out there – Both user and computer can take initiative – Goal: conversational skill of system should approach that of human operator • But… – These systems are built by experts – Huge learning curve for novices, and – Tremendous iterative effort required even from experts • For this reason – Most advanced conversational systems remain in research labs * e. g. Jupiter weather info system (+1 -888 -573 -TALK) : Zue et al, IEEE Trans. SAP, 8(1), 2000 – However, we have seen limited commercial deployment * e. g. AT&T’s “How May I Help You”, Gorin et al, Speech Communication, 23, 1997 Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Simplifying Conversational System Creation • Goal: make it easier for both expert and novice developers to create conversational interfaces – But still use advanced human language technologies • Strategy: simplify configuration process r e d l i u – Automatically configure technology components bases on examples – Allow specification through web interface or unified configuration file B h c e e p Web Interface Recognition S Understanding Configuration Engine Configuration File Dialogue Management Generation Context Resolution Synthesis Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Configuring a Conversational Interface: Knowledge Representation • First, define example sentences for in-domain actions Action identify set Examples I would like to know today’s weather in Denver What will the temperature be on Tuesday Turn on the radio in the kitchen please Can you turn the dining room lights off • Then, define the important concepts present in the actions (attributes): – Concept values make up recognizer vocabulary! – Examples of attributes automatically matched to attribute classes Attribute city room Values Boston, Denver, San Francisco, … living room, dining room, kitchen, … Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Starting with a Database Table • Provide database table to configure speech interface: Name Phone Email Office Jim Glass x 3 -1640 glass@mit. edu 601 x 3 -0248 cyphers@mit. edu 604 Scott Cyphers ecoder@mit. edu 633 Eugene Weinstein X 3 -8569 • Only some columns are used to access entries (e. g. , Name) – Values of those columns become values for domain concepts – Default action sentences are automatically generated • But, every table cell can potentially be an answer to a question – All Names of columns become one concept – “property” Attributes Actions name Jim Glass, Scott Cyphers… property Name, Phone, Email, Office request_property request_office What is the email for Jim Glass? Where can I find Jim Glass? Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Dialogue Management • Generic Dialogue Manager (Polifroni & Chung, ICSLP 2002) Language Generation Hotels Generic Dialogue Manager Air Travel Dialogue Management Sports Speech Synthesis Weather Audio Back End Speech Recognition Context Resolution Language Understanding Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory • Plan system responses • Regularize common concepts • Summarize database results Computer Aided Design on User Interfaces – Jan 16 th, 2004
Context Resolution Input Query “Show me restaurants in Cambridge. ” Resolve Deixis “What does this one serve? ” Resolve Pronouns “What is their phone number? ” Inherit Predicates “Are there any on Main Street? ” Incorporate Fragments “What about Massachusetts Ave? ” Fill in Default Values “Give me directions from MIT. ” Query Interpreted in Context
Human Language Technology Details • Approach: Use same technologies as deployed in our mainstream, more complex systems • Speech Recognizer (Glass, Computer, Speech, and Language, 2003) – Trained on 100+ hours of mostly telephone speech – Word pronunciations supplied by large dictionary, generated by rule, or provided by developer • Natural Language Understanding: (Seneff, Computational Linguistics, 1992) – Hierarchical sentence grammar used to parse sentence hypothesis – Back off to concept spotting when no full parse is made • Language Generation: (Baptist&Seneff, ICSLP 2000) – Used in: SQL (DB Query) generation, paraphrasing & URL-encoding meaning representation, responses Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Web-based Interface Defining Actions and Concepts (Attributes) Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Web-based Interface: Viewing Sentences Examining how sentences are reduced to an action and a set of attribute-value pairs Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Web-based Interface: Response Generation Domain independent system prompts Customizing system responses Domain specific system prompts Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Web-based Interface: Editing Pronunciations Modifying system generated pronunciations for the vocabulary Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Web-based Interface: Context Resolution configured through Masking and Inheritance of concepts Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Voice Configuration File: An Alternative to the Web Interface • Entire domain can be specified in single configuration file – Allows for automated generation of conversational systems <actions> <request_name> = i would like a restaurant | can you (show|give) me a Chinese restaurant in Arlington; </actions> <attributes> <cuisine> = Chinese|Taiwanese; <city> = Washington | Boston | Arlington; </attributes> <discourse> name masks(city cuisine neighborhood); </discourse> <constraints> <request_name> (city|neighborhood) {prompt_for_city}; </constraints> Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Deployment • Speech. Builder functional for the past three years • Some example domains: – Office appliance control – Laboratory directory (auto-attendant) – Restaurant query system • Has been used by MIT researchers (experts) as well as novice developers at our sponsor companies – Used in technology transfer workshop for pervasive computing project (Oxygen) • Speech. Builder has been used as an educational tool – Computational linguistics class at Georgetown University – Summer class at Johns Hopkins University – Youngest Speech. Builder developer: 9 years old Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Japanese Speech. Builder • Created in collaboration with NTT • Challenge: Segmentation (no spaces between words) Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Example Domain • A hotel application using the generic dialogue manager – Compiled via Speech. Builder using constraints shown previously – Other generic functionality is automatically included • Illustrated technical issues: – – – – Soliciting necessary information from user Interpreting fragments correctly in context Canonicalizing relative dates Ordering and summarizing results of query to content provider Resolving superlatives/updating discourse context Interpreting pronouns in context Returning and speaking specific properties Repeating previous replies Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Another Example Domain: Object Manipulation System • Stock Speech. Builder domain for spoken dialogue • Custom back-end connected to stereo camera and person tracking algorithm (Demirdjian, WOMOT 2003) Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Ongoing and Future Work • Incorporate speech synthesis – Allow use of concatenative speech synthesizer (Yi et al, ICSLP 2000) in Speech. Builder • Allow use of multiple modalities – Provide functionality to incorporate multimodal input into systems • Improve dialogue management tools and modules – Improve ability of Speech. Builder systems to use more sophisticated dialogue strategies – Provide additional generic semantic concepts for use in domains • Allow system refinement by unsupervised learning – Use confidence scores to improve domain language model (Nakano&Hazen, Eurospeech 2003) • Allow system modification in real-time – Need ability to re-train recognizer during runtime (Schalkwyk et al, Eurospeech 2003) Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Thank You! For more information: • http: //www. sls. csail. mit. edu/ • Email us! ecoder@mit. edu • Jupiter weather Information system: º +1 -617 -258 -0300 (outside USA) º 1 -888 -573 -TALK (USA toll-free) • Mercury flight information system: º +1 -617 -258 -6040 (outside USA) º 1 -877 -MIT-TALK (USA toll-free) • Pegasus flight status system: º +1 -617 -258 -0301 (outside USA) º 1 -877 -LCS-TALK (USA toll-free) Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
THE END Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Utility for rapid prototypin speech-based interfaces Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory – Used to create demonstra for NTT CS Labs open hou – Prototypes were develope a few days of effort – Computer Three papers submitted fo Aided Design on User Interfaces – Jan 16 , 2004 th
Hum an Lan gua ge Tec hnol ogie s • Only some columns are used to access entries (e. g. , Name) – Values of those columns become values for domain concepts – Default action sentences are automatically generated • But, every table cell can potentially be an answer to a question – Names of non-access columns become a concept Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
To Configure Response Generation… • For each concept present in the domain, define how queries about that concept should be answered <telephone> = “The telephone for : name is : phone” • Define some prompts for generic events, e. g. welcome and goodbye <welcome> = “Welcome to the auto-attendant” <no_data> = “Sorry, there was no data matching your request. ” Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Hum an Conversational User Interfaces: Lan Input gua ge Tec Recognition Speech hnol “Find me a flight to ogie Boston on Tuesday” s Side Understanding Text Meaning “Back-end” Technologies action=flights to_city=Boston day=Tuesday Action Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory DB Computer Aided Design on User Interfaces – Jan 16 th, 2004
Hum Conversational User Interfaces: an Lan gua. Delta flight, number fifty five gefrom La Guardia to Boston… Speech Synthesis Tec hnol ogie s Output Side Text Generation flight_num=55 airline=Delta origin=LGA dest=BOS Action Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Meaning DB Computer Aided Design on User Interfaces – Jan 16 th, 2004
Or Is It ? Conversational User Interfaces: The Whole Picture Recognition Speech Synthesis Understanding Text Generation Text Meaning Action Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
The Missing Pieces: Context and Dialogue • Context Resolution: action=flights to_city=Boston day=Tuesday + Last time, the user asked for a flight from LGA action=flights origin=BOS dest=LGA day=Tuesday = • Dialogue Management: action=flights to_city=Boston day=Tuesday + = “Which city would you like to fly from? ” Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Conversational User Interfaces: The Whole Picture Recognition Speech Synthesis Understanding Text Generation Text Context Resolution, Dialogue Management Meaning Action Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
The Problem With Conversational Interfaces… • Complex conversational systems are out there – Both user and computer can take initiative – Goal: conversational skill of system should approach that of human operator • But… – These systems are built by experts – Huge learning curve for novices, and – Tremendous iterative effort required even from experts • For this reason – Most advanced conversational systems remain in research labs * e. g. Jupiter weather info system (+1 -888 -573 -TALK) : Zue et al, IEEE Trans. SAP, 8(1), 2000 – However, we have seen limited commercial deployment * e. g. AT&T’s “How May I Help You”, Gorin et al, Speech Communication, 23, 1997 Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Configuring Response Generation… • For each concept present in the domain, define how queries about that concept should be answered • Configure some generic prompts for summarizing long results • Define some prompts for generic events, e. g. welcome Property/ Response Condition The phone number for : restaurant_name is phone : restaurant_name serves : cuisine Welcome No matches Welcome to the restaurants domain I’m sorry, I couldn’t find any restaurants matching your request I found five restauraunts : items Many matches item (what to return : restaurant_name when summarizing) Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Configuring Context Resolution • Context Resolution (discourse) configured through Masking and Inhertiance of concepts • Inheritance configures how actions remember concepts, e. g. : – – – User: “What is the phone number for Jim Glass” N System: “Jim Glass’ phone number is 3 -1640 a User: “What about his email address? ” m e System: “Jim Glass’ email address is glass@mit. edu” is Name concept is inherited in • Masking configures how certain concepts blockheother concepts, Ci rit even in the presence of inheritance, e. g. ty – – – ed User: “Do you have any restaurants in Boston? ” System: “In Boston, I have the following…” User: “What about in Times Square? ” System: “In Times Square, New York, I have…” City concept is masked by Neighborhood concept Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory is m as ke d Computer Aided Design on User Interfaces – Jan 16 th, 2004
Voice Configuration File • Developers can also use Voice Configuration (VCFG) file format to configure Spech. Builder domains: <actions> <request_name> = i would like a restaurant | can you (show|give) me a Chinese restaurant in Arlington; </actions> <attributes> <cuisine> = Chinese|Taiwanese; <city> = Washington | Boston | Arlington; </attributes> <discourse> name masks(city cuisine neighborhood); </discourse> <constraints> <request_name> (city|neighborhood) {prompt_for_city}; </constraints> Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Dialogue Management • Generic Dialogue Manager (Polifroni & Chung, ICSLP 2002) Hotels Language Generation Speech Synthesis Air Travel Dialogue Management Sports Database Audio Context Resolution Speech Recognition Generic Dialogue Manager Language Understanding Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Weather • Plan system responses • Regularize common concepts • Summarize database results Computer Aided Design on User Interfaces – Jan 16 th, 2004
Deployment • Speech. Builder functional for the past three years • Some example domains: – Office appliance control – Laboratory directory (auto-attendant) – Restaurant query system • Has been used by MIT researchers (experts) as well as novice developers at our partner companies • Speech. Builder has been used by students in – Computational linguistics class at Georgetown University – Summer class at Johns Hopkins University – Technology transfer workshop for pervasive computing project (Oxygen) • In collaboration with NTT, we have developed a Japanese version of Speech. Builder. Japanese domains: – Bus timetable system – Weather information system Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004
Configuring a Speech Interface with Speech. Builder: Knowledge Representation • First define some concepts present in the domain (attributes): – Concept values make up recognizer vocabulary! Attribute city room Values Boston, Denver, San Francisco, … living room, dining room, kitchen, … • Then, define examples of things to do with the concepts (actions) – Examples of attributes automatically matched to attribute classes Action identify set Examples I would like to know today’s weather in Denver What will the temperature be on Tuesday Turn on the radio in the kitchen please Can you turn the dining room lights off Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory Computer Aided Design on User Interfaces – Jan 16 th, 2004