Voice Browser Presented By Sharmin Sirajudeen S 7

Voice Browser Presented By Sharmin Sirajudeen S 7 CS Reg No : 07412017

What is a Voice Browser? A voice browser is a device : �that interprets voice input and interprets voice markup languages to generate voice output. �that interprets a script which specifies exactly what to verbally present to the user as well as when to present each piece of information.

Motivation �There are 10 times as many telephones as connected PCs. �Cell phones usage is growing dramatically. � Speaking and listening are the natural usage modes for modes.

Overview �Time frame: 1998 to ? ? �Hands-free accessing of web. �Pragmatic interface for functionally blind users.

Key Technologies �Speech Recognition �Speech Synthesis

Speech Recognition Voice input Vo. XML file Text

Speech Synthesis Text Vo. XML file Output(Pre-recorded)

Standardization World Wide Web Consortium(W 3 C) �Voice Browser Working Group �Speech Interface Framework

W 3 C Voice Browser Working Group �Established on 26 March 1999. �Re-chartered through 31 January 2009. �W 3 C Team Contacts are Kazuyuki Ashimura and Matt Womer. �Co-chaired by Jim Larson and Scott Mc. Glashan.

Speech Interface Framework �Voice. XML 1. 0 �Voice. XML 2. 1 �Voice XML 3. 0 �Speech Recognition Grammar Specification (SRGS) 1. 0 �Speech Synthesis Markup Language (SSML) 1. 1 �Call Control XML (CCXML) �State Chart XML (SCXML) �Semantic Interpretation (SISR) 1. 0 �Pronunciation Lexicon Specification (PLS) 1. 0

Voice XML(Vo. XML) �Version 1. 0 - designed for creating audio dialogs. �Version 2. 0 - uses form interpretation algorithm(FIA). �Version 2. 1 - 8 additional elements. �Version 3. 0 - relationship between semantics and (31 August 2010) syntax.

What about HTML ? �HTML don’t have �Tampered prompts �Grammar specifying alternative words that the user can speak in response to the question. �Instructions to the text-to-speech synthesizer about how to say words and phrases. �Adding these capabilities would complicate HTML, a language developed just for visual UI.

Speech Recognition Grammar Specification(SRGS) �Version 1. 0 -for specifying grammars of each user input to a speech application.

Speech Synthesis Markup Language(SSML) �Version 1. 0 -for specifying the rendering of synthesized speech to the user. �Version 1. 1 - enhancement of SSML 1. 0 for better support of the world's languages including Asian, Eastern European, and Middle Eastern languages.

Call Control XML(CCXML) For specifying call control functions State Chart XML(SCXML) Execution environment based on CCXML and Harel State Tables.

Semantic Interpretation Speech Recognizer(SISR) Version 1. 0 - For specifying possible translation of text from the output of a speech recognizer. Pronunciation Lexicon Specification (PIS) Version 1. 0 - Syntax for specifying pronunciation lexicons to be used by Speech Recognition and Speech Synthesis.

Model Architecture

Applications �It can be divided into three categories : �Web Browsing �Limited information Access �Spoken Dialog Systems

Web Browsing �Browse any web pages using speech input. �Parsing for the purpose of voice recognition done when the page is accessed. �May or may not produce a voice feed back.

Limited Information Access �Useful information in limited domains like weather in a city, checking stock updates etc. �Audio feed back

Spoken Dialog Systems �Client-server architecture is used �Used for connecting to a remote server by a Java applet(client). �Examples are connecting to email servers

Benefits �Voice is a very natural user interface which speeds up browsing. �Less space requirements. �Portable voice browsers can also be implemented. �Practical interface for functionally blind users. �Users can browse web while keeping there hands and eyes for other jobs

Future �Voice browsing will become visual(Multi-modal) �Can be integrated to an OS �Integrated to every application.

Conclusions �Browser technology is changing very fast these days and we are moving from the visual paradigm to the voice paradigm. �Voice browser is the technology to enter this paradigm. �Voice browser is a device which interpret voice input and generate voice output.

References �http: //www. w 3. org/standards/webofdevices/voice �http: //xml. coverpages. org/ccxml. html �http: //reactos. ccp 14. ac. uk/Voice/ �http: //www. w 3. org/Voice/1998/Workshop/Phil. Jenkin s. html (for IBM)