Development of conversational interfaces at Nokia Research Center

  • Slides: 26
Download presentation
Development of conversational interfaces at Nokia Research Center Boda Péter Pál peter. boda@nokia. com

Development of conversational interfaces at Nokia Research Center Boda Péter Pál peter. boda@nokia. com Language Technology & Applications, Voice Interfaces Group Speech and Audio Systems Laboratory Nokia Research Center 14 October, 2002 1 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Contents • Background • personal • Language Technology and Applications group at NRC •

Contents • Background • personal • Language Technology and Applications group at NRC • A commercial implementation: Nokia One Voice Service • Overview of CATCH-2004: multilingual conversational interface • Demos • Summary 2 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Personal background • Born in 1965, Miskolc, Hungary • M. Sc. in Telecommunications, 1991,

Personal background • Born in 1965, Miskolc, Hungary • M. Sc. in Telecommunications, 1991, Budapest, Tech. Univ. of Budapest • Post-graduate studies: TUB 1991 -1994, HUT 1992 -1994, Nijmegen 1995 • Lic. Tech. Speech Technology and Neural Networks, 1995, Helsinki, HUT • Working on • speech analysis 1990 -1995 • speech recognition 1995 -1997 • spoken dialogue systems, language technology 1996 - 3 • Interest: • Natural Language Understanding (semantic decoding) • Dialogue Management • Processing multimodal and contextual input © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Language Technology and Applications • Mission: develop language technology for Nokia’s offering • Dialogue-based

Language Technology and Applications • Mission: develop language technology for Nokia’s offering • Dialogue-based application development for telecommunication (mainly network-based implementations) • Seamless integration of Natural Language Understanding technology to user interfaces • Covering the entire development process: • • conceptual design data collection and analysis grammar building and tuning, NLU training & testing Wizard-of-Oz experiments type-in and speech-enabled tests objective and subjective evaluation human factors consideration, usability studies Personnel: a diverse team of linguists, software and telecomm engin 4 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

What will new generation of speech interfaces bring? • Enhanced usability: - naturalness in

What will new generation of speech interfaces bring? • Enhanced usability: - naturalness in terms of linguistic expressions; - ease of use; - human-human like dialogues; - accelerated system-user interactions; • Well-defined framework to port to other languages & tasks : - end-to-end solutions (design, data collection, Wizard-of-Oz studies, implementation, test, assessment); - shortened development cycle (development tools). 5 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

A commercial implementation: Nokia One Voice Service http: //www. nokia. com/nokiaone 6 © NOKIA

A commercial implementation: Nokia One Voice Service http: //www. nokia. com/nokiaone 6 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Nokia One Voice Service 7 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10.

Nokia One Voice Service 7 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Speech interface for e-mail reading • Features • DTMF and speech access (language of

Speech interface for e-mail reading • Features • DTMF and speech access (language of the user interface is English) • dialogue-based implementation with mid-complex task grammar • functionalitites: • browsing e-mails • selecting for reading • send in SMS • reply with voice clip • accurate language identification • text-to-speech (TTS) for several languages when reading back e-mails • English, Finnish, Italian, French, German, Spannish • e-mail preprocessors prior to TTS usability studies show that the speech version is more popular now than the DTMF version • 8 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Some general comments • Before implementing any speech interface: • 9 think about its

Some general comments • Before implementing any speech interface: • 9 think about its role: replacement or addition? • if addition, how it will help/complete the current user interface • is there any real added value it can bring? – acceleration, security? • think carefully the efforts you need to develop a solution • amount and ratio of research and implementation • never underestimate the results of user/usability tests – go for real • TTS is important, users comment primarily that and not the recognition part. TTS can mean language technology, as © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

An EU project: CATCH-2004 – Converse in AThens-2004, Cologne, Helsinki http: //www. catch 2004.

An EU project: CATCH-2004 – Converse in AThens-2004, Cologne, Helsinki http: //www. catch 2004. org/ 10 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

A multi-multi project …. Jan 2000 -June 2002 30 months 7 partners 5 countries

A multi-multi project …. Jan 2000 -June 2002 30 months 7 partners 5 countries 603 Person-Months 6. 5 M€ (3. 25 from EC) 2 demonstrators : Athens, Helsinki 1 tester: Cologne 16 deliverables 11 milestones 11 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Consortium Finland Germany France, Germany, Greece, Czech Republic Greece Gerhard-Mercator Universität Duisburg NTUA 12

Consortium Finland Germany France, Germany, Greece, Czech Republic Greece Gerhard-Mercator Universität Duisburg NTUA 12 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Overview • The "flag-ship" of the 5 th EU-IST programme • Objectives: • conversational

Overview • The "flag-ship" of the 5 th EU-IST programme • Objectives: • conversational interface to (city) information services: build various applications, possessing high performance accuracy and satisfying requirements set for wellfunctioning spoken dialogue systems • • multilingual (Finnish, English, German, Greek) multidevice (kiosk, phone, smart wireless) multimodal (GUI, speech) Internet infrastructure (WAP, Voice. XML, remote databases) • Nokia's role: • WAP access • Multimodal browsing • NLU development for Helsinki demonstrator • Helsinki demos: • 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit • 2001: Program Guide Information Service - has relevance to other project 13 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Inside the NLU module Database Speech recognition 14 © NOKIA Natural Language Understanding (NLU)

Inside the NLU module Database Speech recognition 14 © NOKIA Natural Language Understanding (NLU) incl. Dialogue Manager NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter Speech synthesis

What does NLU module do? (1) Interprets the meaning of the user utterance and

What does NLU module do? (1) Interprets the meaning of the user utterance and decides what to do with the utterance. (2) Interacts with the backend database (3) Decides what kind of answer will be provided • The NLU toolkit employed in CATCH-2004: • IBM Via. Voice. Phone Telephony Natural Language Tools – Statistical approach – The speaker is not restricted to any particular vocabulary or commands but can freely express the request by using natural language expressions. 15 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

The components of NLU module • The NLU module contains four main components. Sequence

The components of NLU module • The NLU module contains four main components. Sequence of words, as the Output of the recogniser LM allows. Statistical Classer Canonicalizer Statistical Parser Dialog Manager 16 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter Extracts the key concepts of the utterance. Transforms certain concepts to a form which is understood by the backend database. Determines what to do with the key concepts from the classer. Directs the interaction between the user and the system.

Multilingual Architecture NLU Speech recognition Multilingual classer Multilingual TASK Multilingual parser Multilingual LM/Voc Canonicalizer

Multilingual Architecture NLU Speech recognition Multilingual classer Multilingual TASK Multilingual parser Multilingual LM/Voc Canonicalizer (Lang ID) Dialog manager Multilingual AM Answer generation (language-dependent TTS) LM Voc AM TTS Lang ID 17 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter language model vocabulary acoustic models text-to-speech language identification

Historically speaking …. • Helsinki demos: • 2000: Art-Goes-Kapakka - just to experiment the

Historically speaking …. • Helsinki demos: • 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit • 2001: Program Guide Information Service – more realistic • AGK • developed as the first NLU application at Nokia • good exercise to walk through (with sweat) the entire development process • strict co-operation with IBM, regular consulting • results were comparable to others • easiness: handlable size & complexity, (almost) available database • PGIS • we wanted a more real-life application • Electronic Program Guides are coming to use as digital TV spreads • on-going standardisations (MPEG-7 -> program types and subtypes) 18 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Supported functionalities in PGIS • A LIST based on the following parameters: DATE TIME

Supported functionalities in PGIS • A LIST based on the following parameters: DATE TIME CHANNEL PROGRAM NAME LANGUAGE PRICE PROGRAM TYPE PERFORMER NEW • A QUERY about a particular program: DATE TIME DURATION CHANNEL PERFORMERS LANGUAGE PROGRAM TYPE 19 © NOKIA YEAR PRICE COUNTRY OF ORIGIN RESTRICTIONS EPISODE TITLE DESCRIPTION WEB ADDRESS RE-RUN PEOPLE BEHIND THE PROGRAM SUBTITLES NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Gaining better performance via iterative tests error rate English system automatic speech recognition (word

Gaining better performance via iterative tests error rate English system automatic speech recognition (word error rate) 20 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

S: Welcome to Program Guide. What would you like to know? A real (!!!!)

S: Welcome to Program Guide. What would you like to know? A real (!!!!) dialogue with S: I found the following movie programs on Monday between 6 pm and 12 pm: PGIS U: Could you please tell me about movies tonight? It's Always Fair Weather, The Murder Men, Boom Town, and The Hook. U: What time is the first one? S: It's Always Fair Weather starts at 5 past 5 pm. How else can I help you? U: How long does it last? S: It's Always Fair Weather lasts 1 hours 40 minutes. What information can I offer you next? U: Who are the actors? S: It's Always Fair Weather. The actors are Gene Kelly, Dan Dailey, Michael Kidd, Cyd Charisse. How else can I help you? U: Can I have a description for it please? S: I can offer you the following type of description about the program called It's Always Fair Weather. The unbeatable musical team of Kelly and Donen is behind this zappy musical about three serviceman buddies reunited after 10 years…. . . What else can I do for you? U: That's all for now. Good-bye. S: Thank you for using the Program Guide. Good-bye. 21 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Welcome to Program Guide! How may I help you? Movies tonight? I found the

Welcome to Program Guide! How may I help you? Movies tonight? I found the following programs … …. starting time of the 1 st? … it starts at 5. 15 pm …. duration? (in Finnish) … it is 1 h 25 min long (in Finnish) …. description? 22 Michael Douglas is in Coma … NEW CONTEXT! Movies with Michael Douglas? NEW CONTEXT! … sorry, no programs for youngsters (in Finnish) Programs for youngsters? (in Finnish) NEW CONTEXT! What kind of info I can offer next? (in Finnish) I can offer the following … no text message, description …. thanks. (in Finnish) To text © NOKIA NRCmessage? – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter Channels? BBC World, CNN, Eurosport, TCM NEW CONTEXT! What’s on BBC World tonight at 10 pm? World News at 10 pm …. duration? (in Finnish) … it takes 5 minutes (in Finnish) That’s all for now. Good bye!

What lessons have we learnt? • In general: • Research project has its own

What lessons have we learnt? • In general: • Research project has its own difficulties – risk must be taken but within limits; • Know your partners, their capabilities and be initiative in co-operation; • Strong dependency on one partner’s technology might be problematic; • About technology • Good to have linguists around, although many of the development phases require engineering skills; • Everything should be planned as precisely as possible, even tests and evaluation methods; • The best results are gained with successive test-evaluation-improvement cycles; • This kind of technology is quite new the users often don’t know the possibilities of the system, therefore the instructions must be very guiding and clear: • • 23 © NOKIA difficult if only a demo system available with fake database, without comparable traditional system; test users must be awarded – very crucial, otherwise no motivation NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Finally …. Gábor Dénes (1969): "If enough people work hard enough on the problem

Finally …. Gábor Dénes (1969): "If enough people work hard enough on the problem of speech recognition, it will be solved by mid next century. " 24 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

References • http: //www. nokia. com/nokiaone • Oria, D. & Koskinen, E. , ”E-Mail

References • http: //www. nokia. com/nokiaone • Oria, D. & Koskinen, E. , ”E-Mail Goes Mobile: The design and implementation of a spoken language interface to e-mail” – ICSLP’ 2002 • http: //www. catch 2004. org/ • Harrikari, H. , M. Mast, T. Ross & H. Schulz: 2002, “Different Approaches to Build Multilingual Conversational Systems”. 5 th International Conference on Text, Speech and Dialogue, TSD 2002, Brno, Czech Republic. • Kleindienst J. , L. Seredi, P. Kapanen & J. Bergman: 2002 a, “CATCH-2004 Multi. Modal browser: Overview Description with Usability Analysis”. IEEE 4 th International Conference on Multi-modal Interfaces, Pittsburgh, PA, U. S. A. • Kleindienst J. , L. Seredi, P. Kapanen & J. Bergman: 2002 b, “Loosely-coupled approach towards multi-modal browsing”, Submitted to Universal Access in Information Society magazine’s special issue on Multi-modal User Interfaces. • Boda, P. et al. : “Subjective Evaluation of a Personalised Conversational Interface to a Program Guide Information System ” – Submitted to the User Modeling and User. Adapted Interaction journal (UMUAI) Special Issue on User Modeling and Personalization for Television. 25 © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter

Abbreviations 26 • AM acoustic model • ASR automatic speech recognition • CTI computer-telephone

Abbreviations 26 • AM acoustic model • ASR automatic speech recognition • CTI computer-telephone integration • DM dialogue manager • LM langauge model • NLU natural language understanding • SUI speech user interface • TTS text-to-speech synthesis • VVT Via. Voice Telephony (IBM's speech resources) • WO Z wizard of Oz © NOKIA NRC – kieliteknologia kurssi. PPT/ 14. 10. 2002 / Boda Péter