Distributed Speech Recognition ETSI STQ Aurora Distributed Speech
Distributed Speech Recognition ETSI STQ Aurora Distributed Speech Recognition (DSR) Dieter Kopp Alcatel Research & Innovation email: Dieter. Kopp@alcatel. de Dieter Kopp 9. 7. 2001 1
DSR system vision Distributed Speech Recognition Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
ETSI STQ Aurora Distributed Speech Recognition t Participants o. Alcatel, AT&T, British Telecom, Ericsson, France Telecom, Hewlett Packard, Motorola, Nokia, Qualcomm, Siemens, Sony, Texas Instruments, IBM, Conversay, etc. t MEL-Cepstrum DSR Front-End & Compression o. Complete - ETSI standard published in February 2000 t Advanced Noise Robust DSR Front-End o. Current activity - standard expected in 2002 t DSR Application & Protocols o. Architecture definition, Client /Server protocol specification & contribution to other standardization group Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
ETSI STQ Aurora Distributed Speech Recognition Front- End Standardization Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
DSR Elements Distributed Speech Recognition Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Performance Enhancement with DSR Distributed Speech Recognition Telephone Application & DSR Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Benefit of DSR for IP transmission Distributed Speech Recognition • Worst performance obtained using speech codec • Speech Recognition over IP using DSR has at 50% packet lost only 3% recognition rate degradation compared to 63% for coded speech transmission (Simulation done by BT) Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Distributed Speech Recognition Advanced Noise Robust DSR Front-End t Goals: o. Standardization of a Noise Robust DSR Front-End algorithm under following conditions: l 50% recognition rate improvement compared to the existing DSR Front-End standard l Latency below 250 ms l Complexity below 17 w. MOPs t Selection process using: ¬ Aurora database, Speech. Dat. Car (top 2/3 cluster selection) Large vocabulary database (final winner) Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
ETSI STQ Aurora Distributed Speech Recognition Application & Protocols Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Application & Protocols Subgroup Distributed Speech Recognition t Definition of DSR scenarios for applications o Information applications l Voice portals (flight, weather, news, movies) l Location-specific information l Voice Navigation of maps o Transaction-based applications l Finance l e-commerce (various) o Information capture Etsi_P&A-multi-modal. ppt l Dictation l Form filling Dieter Kopp 9. 7. 2001
Application & Protocols Subgroup Distributed Speech Recognition t Specification of the Client /Server architecture t Specification of the communications elements (voice transport interface, synchronization between Client/Server, etc. ) t Contribution to other standardization groups t Participants: Alcatel, British Telecommunications, Ericsson, HP, IBM, ICSI, Intel Labs, Motorola, Nokia, Qualcomm, Speech. Works, Temic/Daimler Chrysler, TI, Verbaltek, Wave. Makers, Philips, etc. Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
ETSI/STQ-Aurora Protocol & Application Distributed Speech Recognition Voice Recognition DSR URL Voice page Graphic I/O Speech output GUI page Mobile Network Open & establish connection, Capability negotiation Connection to DSR Back-End Server Pre-processing data, Speech output, contents exchange Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Distributed Speech Recognition Applications for Multi-modal Distributed Speech Recognition è Advanced Applications towards 3 G terminals Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Multi-modal User Interaction Distributed Speech Recognition Output: Speech, Display Capability Feedback/ Interaction Service Request Input: Speech, Key, Pen, etc. Etsi_P&A-multi-modal. ppt Application Presentation Manager User Profile Dependent on the environment (background noise) and the user preferences more or less speech I/O could be used Dieter Kopp 9. 7. 2001
Scenario: Personal Information Manager Distributed Speech Recognition 1 2 You have meetings at 9, 11: 30 and 1 p. m. . You have two meeting requests. Details: 9 until 10 o’clock, phoneconference MAP 10: 30 possible meeting with M. Hauser Marketing, Tell me todays schedule! 3 Who will participating the 9 o’clock phone call? Tuesday, 26. 6. 2001 8: 30 9: 00 MAP TP 4 9: 30 phone conference 10: 00 10: 30 ? M. Hauser 11: 00 ? Marketing How. Lunch may I help you? 11: 30 12: 00 12: 30 Menu WAP Select 1: 00 department conv. Mobile `02 4 9: 00 e-business O’Neill, Scott Dumont, Denise 5 11: 30 until 12: 30 lunch. . . Invite Jim Mason! Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Multi-modal Architecture Distributed Speech Recognition DSR decoder Audio Codec (s) Communication Manager DSR encoder Audio drivers Voice Transport Interface GUI Browser DOM Wrapper GUI drivers Data Transport Interface Network Transport Layer Synchronization Interface Audio I/O Network Transport Layer Gateway and router with Voice transport and Synchronization Support Network Server Conversational Engines Voice Browser DOM Wrapper Synchronization Protocols MM Shell HTTP GUI I/O Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001 Content Server
Distributed Speech Recognition P&A next steps 1. Voice Transport protocol specification and contribution to 3 GPP 2. Definition of the Multi-modal Shell function. How the synchronization could be managed 3. Liaison offer to W 3 C for the standardization of the DOM interface for Voice. XML 4. Contribution to W 3 C Multi-modality group with ETSI multimodal architecture 5. Common interface to all speech recognizers (IBM activity) Etsi_P&A-multi-modal. ppt Dieter Kopp 9. 7. 2001
Distributed Speech Recognition Etsi_P&A-multi-modal. ppt Thank You Dieter Kopp 9. 7. 2001
- Slides: 18