DAWN Dynamic Aural Web Navigation Gopal Gupta S

DAWN: Dynamic Aural Web Navigation Gopal Gupta, S. Sunder Raman, Mike Nichols, H. Reddy, N. Annamalai Department of Computer Science University of Texas at Dallas

Introduction • The Web is intrinsically Visual. • We need computers to access the Web. • People with visual disabilities cannot fully iiiinteract with the Web.

Aural Web • What is the Aural Web? § Based on the traditional Web. § Voice/Audio for Input/Output. § Listeners have complete control over navigation.

Aural Web • What is the Aural Web? • Why do we need an Aural Web? § Make the Web accessible to all. § Do away with the need to have a computer to access iiiithe Web.

Aural Web • What is the Aural Web? • Why do we need an Aural Web? • How can we obtain an Aural Web? § Translate HTML to Voice. XML (however, translation is not enough). § Need to enhance Voice. XML to make it dynamically iiiinavigable.

Voice. XML • W 3 C Standard for marking voice documents • Voice. XML documents are ‘played’ on voice iiibrowsers. • A Voice. XML document consists of various iiiforms. Form names are used to control iiinavigation. • Inputs are restricted to a set of pre-defined iiiwords specified via a grammar.

Example Voice. XML Document n <vxml version="2. 0"> <form> <field name="rich"> <grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar> <prompt>Would you like to get rich quick? </prompt> <filled>Gotcha. <if cond="rich==‘yes’"> You want to be rich! < goto next="rich. vxml" /> <else /> You don't want to be rich. < goto next="poor. vxml" /> </if> </filled> </field> </form> </vxml>

Translating HTML to Voice. XML • A module to denotationally map HTML constructs iiito Voice. XML. • It is extendable and flexible.

Translating HTML to Voice. XML Input Attributes Input Duration in Seconds for Text Box : Input Duration in Seconds for Text Area : HTML Tags Output Text <blockquote> Starting of text quoted from elsewhere. </blockquote> Ignore Interface Sheet

HTML to Voice. XML Translator • A module that denotationally map HTML constructs iiito Voice. XML. • It is extendable and flexible. • It can handle Forms. It preserves information about iiithe submit type and target URL. • The translator imposes certain reasonable iiirestrictions on the input HTML.

DAWN Architecture HTML to Voice. XML Translator. Y H W ? … Voice. XML Enhancer.

Limitations of Voice. XML • Navigation is controlled by the author; the listener has iiivirtually no control. • The author has to hardcode every possible navigation iiipath (obviously not possible). • Poorly authored documents become difficult to iiibrowse. • Speech recognition technology allows an arbitrary iiiuser to speak only pre-determined phrases.

Limitations of Voice. XML (cont’d) • These limitations result in Voice. XML’s being useful for only simple applications. • Thus, simple database lookup (e. g. , American Airlines airline information system) are possible, • But advanced applications that require complex interaction are not (e. g. , making air reservation). • What is needed is the ability for the listener to move around the VXML document at his/her will.

Solution • We introduce the concept of Voice Anchors, allowing iiilisteners to dynamically tag and recall any dialog. • We modify the Voice. XML documents and generate iiinew Voice. XML documents dynamically at run time. • Support pre-defined keywords (e. g. , pause).

Dynamic Voice Anchors • Analogous to bookmarks or HTML anchors. • An anchor is a speech label that can be associated iiiwith a specific dialog. • These anchors can be then used to recall associated iiidialogs. • A single anchor name can be used to tag multiple iiidialogs (cumulative anchor). • Any word can be chosen for an anchor name. The iiiuser spells them out the first time only.

The Enhancer Module • Enhances the Voice. XML file readying it to accept iii. Dynamic Voice Anchors. • Modifies the Voice. XML document to add iiiinterfaces to server side CGIs. • Adds functionality for specific keywords which iiihave pre-determined semantics. • e. g. skip, repeat, pause, resume, back.

The Big Picture

DEMO

Some Applications • EPlan – Is a Web based integrated contingency handling system. The Aural Web increases its accessibility. • Math. ML to Voice. XML – Allows complex Mathematical formulas to be broken down, tagged and recalled. • Searching for directions on the Web. • Make online airline reservations using the phone.

Current & Future Work • Design of Voice Scripting Languages (Talk by Mike Nichols tomorrow in Tiberius 2 at 10: 30 am). • Intelligent Navigation Strategy for navigating Tables. • Finally, incorporate all these techniques into a Voice Browser.

Contributions • An Aural Web based on the traditional Web that allows users to perform complex Web operations using the phone. • Developed techniques to give a listener maximum control (via dynamic voice anchors).