An Introduction to XML IVR and Voice XML

  • Slides: 53
Download presentation
An Introduction to XML, IVR and Voice. XML 30 January 2002 Ian GRAHAM Emerging

An Introduction to XML, IVR and Voice. XML 30 January 2002 Ian GRAHAM Emerging Business Strategy, Bank of Montreal E: <ian. graham@bmo. com> or <ian. graham@utoronto. ca> T: (416) 513. 5656 / F: (416) 513. 5590 Web: http: //www. utoronto. ca/ian/talks/ Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 1

Overview · 1. What is XML – Gotta know that if Voice. XML is

Overview · 1. What is XML – Gotta know that if Voice. XML is to make any sense – XML for universal data • Based on the ways people wanted to use HTML and the Web · 2. A bit about IVR · 3. Voice. XML · 4. Future stuff Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 2

1. What is XML? · A syntax for “encoding” text-based data (words, phrases, numbers,

1. What is XML? · A syntax for “encoding” text-based data (words, phrases, numbers, . . . ) · A text-based syntax. XML is written using printable characters (no explicit binary data) · Extensible. XML lets you define your own tags (essentially data types), within the constraints of the syntax rules · Universal format. The syntax rules ensure that all XML processing software MUST identically handle a given piece of XML data. If you can read and process it, so can anybody else Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 3

What is XML: A Simple Example XML Declaration (“this is XML”) Binary encoding used

What is XML: A Simple Example XML Declaration (“this is XML”) Binary encoding used in file <? xml version="1. 0" encoding="iso-8859 -1"? > <partorders xmlns=“http: //myco. org/Spec/partorders”> <order ref=“x 23 -2112 -2342” date=“ 25 aug 1999 -12: 34: 23 h”> <desc> Gold sprockel grommets, with matching hamster </desc> <part number=“ 23 -23221 -a 12” /> <quantity units=“gross”> 12 </quantity> <delivery. Date date=“ 27 aug 1999 -12: 00 h” /> </order> <order ref=“x 23 -2112 -2342” date=“ 25 aug 1999 -12: 34: 23 h”>. . . Order something else. . . </order> </partorders> Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 4

Example Revisited element tags attribute of this quantity element <partorders xmlns=“http: //myco. org/Spec/partorders” >

Example Revisited element tags attribute of this quantity element <partorders xmlns=“http: //myco. org/Spec/partorders” > <order ref=“x 23 -2112 -2342” date=“ 25 aug 1999 -12: 34: 23 h”> <desc> Gold sprockel grommets, with matching hamster </desc> <part number=“ 23 -23221 -a 12” /> <quantity units=“gross”> 12 </quantity> <delivery. Date date=“ 27 aug 1999 -12: 00 h” /> </order> <order ref=“x 23 -2112 -2342” date=“ 25 aug 1999 -12: 34: 23 h”>. . . Order something else. . . </order> Hierarchical, structured information </partorders> Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 5

XML Data Model - A Tree ref= date= <partorders xmlns=". . . "> <order

XML Data Model - A Tree ref= date= <partorders xmlns=". . . "> <order date=". . . " ref=". . . "> <desc>. . text. . </desc> <part /> <quantity /> <delivery-date /> </order> <order ref=". . . /> </partorders> desc text order part quantity partorders text xmlns= delivery-date order ref= date= Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 6

XML: Why it's this way · Simple (like HTML) – But not quite so

XML: Why it's this way · Simple (like HTML) – But not quite so simple – Strict syntax rules, to eliminate syntax errors – syntax defines structure (hierarchically), and names structural parts (element names) -- it is self-describing data · Extensible (unlike HTML; vocabulary is not fixed) – Can create your own language of tags/elements – Strict syntax ensures that custom tags can be reliably processed · Designed for a distributed environment (like HTML) – Can have data all over the place: can retrieve and use it reliably · Can mix different data types together (unlike HTML) – Can mix one set of tags with another set: resulting data can still be reliably processed Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 7

Mixing language dialects together Default ‘type’ is xhtml <? xml version="1. 0" encoding="iso-8859 -1"?

Mixing language dialects together Default ‘type’ is xhtml <? xml version="1. 0" encoding="iso-8859 -1"? > <html xmlns="http: //www. w 3. org/1999/xhtml 1" xmlns: mt=“http: //www. w 3. org/1998/mathml” > <head> <title> Title of XHTML Document </title> </head><body> <div class="my. Div"> <h 1> Heading of Page </h 1> <mt: mathml> <mt: sup>. . . … Math. ML markup … </mt: mathml> <p> more html stuff goes here </p> </div> </body> </html> Emerging Business Strategy, IBS mt: prefix indicates 'type' mathml (a different language) ian. graham@bmo. com / 416. 513. 5656 8

Defining Specific Language Dialects · Two ways of doing so: – XML Document Type

Defining Specific Language Dialects · Two ways of doing so: – XML Document Type Declaration (DTD) -- Part of core XML spec. – XML Schema -- New XML specification (2001), stronger constraints on XML documents. · Adding dialect specifications implies two classes of XML data: – Well-formed – Valid An XML document that is syntactically correct An XML document that is both well-formed and consistent with a specific DTD (or Schema) · Most current dialects defined using DTDs. · Schemas often used for type validation. Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 9

Classes of XML Dialects · Many XML 'dialects, ' optimised for different roles -–

Classes of XML Dialects · Many XML 'dialects, ' optimised for different roles -– – – presentation metadata utilities s/w development distributed apps. stuff people read or look at for describing things; for use by other software generic XML tools -- XSLT, Schemas, . . . as a software development tool distributed applications, data delivery, Web services · A) Presentational Language (for people/applications) – – Math. ML -- for mathematics SMIL -- for multimedia (Real. Player) XHTML -- new HTML WML -- Wireless WAP-phones SVG -- for graphics XUL -- user interface (Netscape 6) Voice. XML -- voice interfaces · B) Metadata – RDF -- Resource Description Framework Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 10

Classes of XML Dialects · C) Utilities – XSLT -- transform one XML document

Classes of XML Dialects · C) Utilities – XSLT -- transform one XML document into another. E. g: XSLT style sheet in XML data in XSLT processor XML parser data out (XML) XML parser document “objects” for data and style sheet order partorders desc text part quantity delivery-date order Emerging Business Strategy, IBS xza partorders foo bee order ian. graham@bmo. com / 416. 513. 5656 11

Classes of XML Dialects · C) Utilities (cont) - XML Schemas – Define validation

Classes of XML Dialects · C) Utilities (cont) - XML Schemas – Define validation rules for a specific type of XML document • Can define hierarchical nesting rules for elements, allowed attributes and attribute status (like DTDs) • Can define stronger typing constraints on element content, attributes (e. g. , Integers, Integer ranges, real ranges, strings, tokens, etc. ) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 12

Classes of XML Dialects · D) S/w development – ANT -- XML-based build configuration

Classes of XML Dialects · D) S/w development – ANT -- XML-based build configuration file (Java applications) – bean. ML -- language specifying composition and state of a Java-bean based application. Processing a BML script results in a running application configured as described in the script. – use XML as a tool for managing data inside an application XML "server" App XML DB 1 DB 2 DB 3 – Extract data from databases and store in intermediate XML "object" – Useful for application development (sometimes!) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 13

Classes of use for XML · E. 1) For machine-machine communication – Financial information

Classes of use for XML · E. 1) For machine-machine communication – Financial information exchange • Fp. ML, Fin. XML, OFX/IFX, Fix. ML, GOLD, XBRL, Swift. ML – Directory services metadata • dir. XML, DSML (Directory Services Markup Language), …. – Other business transactions • FRML (first retail markup language), …. eb. XML (generic business) – News, data syndication (exchanging data between machines) • XMLnews, ICE, News. ML, RSS, WDDX · E. 2) Manage the connection between machines – Control of machine-machine applications • XML-RPC, SOAP – Brokering of Web "Services" • Biztalk, UDDI, eb. XML Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 14

XML Messaging + Processing SOAP interface Place order (XML/edi) using SOAP Supplier Factory Supplier

XML Messaging + Processing SOAP interface Place order (XML/edi) using SOAP Supplier Factory Supplier Response (XML/edi) using SOAP Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 15

XML Software · XML parser -- Reads in XML data, checks for syntactic (and

XML Software · XML parser -- Reads in XML data, checks for syntactic (and possibly DTD/Schema) constraints, and makes data available to an application. There are three 'generic' parser APIs – SAX – DOM – JDOM Simple API to XML (event-based) Document Object Model (object/tree based) Java Document Object Model (object/tree based) · If XML is not well formed, or is invalid (inconsistent with DTD) then the parser MUST throw a fatal exception, and stop. · Lots of XML parsers and interface software available (Unix, Windows, Psion, Palm, OS/390 | Z/OS, etc. ) · SAX-based parsers are fast · DOM slower, more memory intensive Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 16

XML Parser Processing Model parser interface XML data parser Emerging Business Strategy, IBS XML-based

XML Parser Processing Model parser interface XML data parser Emerging Business Strategy, IBS XML-based application ian. graham@bmo. com / 416. 513. 5656 17

XML Processing: SAX A) SAX: Simple API for XML – – http: //www. megginson.

XML Processing: SAX A) SAX: Simple API for XML – – http: //www. megginson. com/SAX/index. html An event-based interface Parser reports events whenever it sees a tag/attribute/text node/other Programmer attaches “event handlers” to handle the event · Advantages – Simple to use – Very fast (not doing very much before you get the tags and data) – Low memory use (doesn’t read an XML document entirely into memory) · Disadvantages – Not doing very much for you -- you have to do everything yourself – Not useful if you have to dynamically modify the document once it’s in memory (since you’ll have to do all the work to put it in memory yourself!) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 18

XML Processing: DOM B) DOM: Document Object Model – – http: //www. w 3.

XML Processing: DOM B) DOM: Document Object Model – – http: //www. w 3. org/DOM/ An object-oriented interface Parser generates an in-memory tree corresponding to the document DOM interface defines methods for accessing and modifying the tree · Advantages – Very useful for dynamic modification of, access to the tree – Useful for querying (I. e. looking for data) that depends on the tree structure [element. child. Node("2"). get. Attribute. Value("boobie")] – Same interface for many programming languages (C++, Java, . . . ) · Disadvantages – Can be slow (needs to produce the tree), and can take up lots of memory – DOM programming interface is a bit awkward, not terribly object oriented Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 19

DOM Parser Processing Model DOM parser interface XML data application parser Document “object” desc

DOM Parser Processing Model DOM parser interface XML data application parser Document “object” desc text order partorders quantity delivery-date order Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 20

XML Processing: JDOM C) JDOM: Java Document Object Model – – http: //www. jdom.

XML Processing: JDOM C) JDOM: Java Document Object Model – – http: //www. jdom. org A Java-specific object-oriented interface Parser generates an in-memory tree corresponding to the document JDOM interface has methods for accessing and modifying the tree · Advantages – Very useful for dynamic modification of the tree – Useful for querying (I. e. looking for data) that depends on the tree structure – Much nicer Object Oriented programming interface than DOM · Disadvantages – Can be slow (make that tree. . . ), and can take up lots of memory – New, and not entirely cooked (but close) – Only works with Java, and not (yet) part of Core Java standard Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 21

XML Processing: XSLT D) XSLT e. Xtensible Stylesheet Language -- Transformations – http: //www.

XML Processing: XSLT D) XSLT e. Xtensible Stylesheet Language -- Transformations – http: //www. w 3. org/TR/xslt – An XML language for processing XML – Does tree transformations -- takes XML and an XSLT style sheet as input, and produces a new XML document with a different structure · Advantages – Very useful for tree transformations -- much easier than DOM or SAX for this purpose – Can be used to query a document (XSLT pulls out the part you want) · Disadvantages – Can be slow for large documents or style sheets – Can be difficult to debug style sheets (poor error detection) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 22

Part 2. IVR overview Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513.

Part 2. IVR overview Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 23

What is IVR · IVR -- Interactive Voice Response · All the horrible stuff

What is IVR · IVR -- Interactive Voice Response · All the horrible stuff you have to deal with – Answering/managing voice mail – Telephone banking – Call handling at customer support lines – And sometimes, even, surfing the Web Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 24

Roots of IVR · Interface restrictions – Input: Voice and 15 -key keypad (DTMF)

Roots of IVR · Interface restrictions – Input: Voice and 15 -key keypad (DTMF) – Output: Audio information and audio prompts · User restrictions – – All session needs to be in user’s memory (nothing to ‘look at’) Limited information ‘framing’ Many ‘dialogs’ means hard to keep/reference/know context Same UI must handle experienced and novice (first-time) users • Need to account for learning during a session · Application context – Often used within telephony applications, and so must be able to do call control (re-routing + queuing of calls) – Integrate with call centres – Auto-forwarding of information (e. g. , to a call centre) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 25

Provisioning of IVR · Many IVR technology providers · Same basic approaches and ideas,

Provisioning of IVR · Many IVR technology providers · Same basic approaches and ideas, but – Different software, – Different APIs, development tools, architectures · No common framework for ‘scripting’ the dialog sessions that control the user interface – But always the same basic ideas and approaches (next slide) · Known problems with historic approach – Portability (can’t move application elsewhere) – Integrability (can’t easily extend to new non-pure IVR apps; not easily integrable with Web) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 26

IVR - Common Issues · Dual mode interfaces – First time through, user gets

IVR - Common Issues · Dual mode interfaces – First time through, user gets detailed instructions or prompts – Later on, the instructions are sparser, more succinct · Barging in – Sometimes the user should be able to barge in on a dialog, and interrupt it – And sometimes not! · Hierarchy of control layers – ‘Escape’ mechanisms to jump from current dialog to “root” of session · Need to manage session, store session state, and pass data to other applications – So an IVR session has some kind of logic/rules, variables for storing data, etc. Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 27

3. Voice. XML Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656

3. Voice. XML Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 28

What is Voice. XML · “. . . designed for creating audio dialogs that

What is Voice. XML · “. . . designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. ”* · “Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications”* · DTMF component designed for integration with existing telephone systems · XML abstraction layer on top of existing IVR systems · In principle, not restricted to phones – But in practice -- what else would you use? ? * Voice. XML 1. 0 Specification -- http: //www. w 3. org/TR/voicexml Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 29

History of Voice. XML Motorola (1998) 1999 Vox. ML Bell/Lucent + others (1998) PML

History of Voice. XML Motorola (1998) 1999 Vox. ML Bell/Lucent + others (1998) PML Voice. XML forum (2000) Voice. XML 1. 0 W 3 C (2002 in draft) Voice. XML 2. 0 IBM (1998) Speech. ML H. P. (1998) Talk. ML Emerging Business Strategy, IBS ? (defunct) ian. graham@bmo. com / 416. 513. 5656 30

Scope for Voice. XML 1. 0 · Output of synthesized speech (text-to-speech) · Playback

Scope for Voice. XML 1. 0 · Output of synthesized speech (text-to-speech) · Playback of audio files · Creation of audio files (recording and file creation) · Recognition of spoken input (voice recognition) · Recognition of DTMF (touch-tone dialing) input · Telephony features such as call transfer and disconnect · Logical mechanisms for driving and controlling a dialog session Support and implementation of features is often contingent on third-party hardware and software. Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 31

Model for Voice. XML Applications · Base-level management of interaction between a person and

Model for Voice. XML Applications · Base-level management of interaction between a person and a backend system of some sort. Here is the operational model: Implementation (phone hardware) Voice. XML server Incoming Request Outgoing Response Documents / data resources User • hardware Emerging Business Strategy, IBS • UI control • • CGI’s servlets application code resource files ian. graham@bmo. com / 416. 513. 5656 32

Practical Voice. XML Implementations · Commercial systems usually bundle the first two parts together:

Practical Voice. XML Implementations · Commercial systems usually bundle the first two parts together: Voice. XML “solution” Application Resources Incoming Request Outgoing Response User Computer (Sun, Intel) running UNIX/linux/NT/2000 equipped with phone cards (Dialogic, etc. ) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 33

Voice. XML System: Component Technologies PBX Telecom boards Voice. XML server Software utilities Speech

Voice. XML System: Component Technologies PBX Telecom boards Voice. XML server Software utilities Speech synthesis (TTS) Speech recognition (STT) Speech grammars Voice Biometrics Voice. XML servers serve as integrators of various hardware and software Call centre CT Integration Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 34

Technical Integration issues · Standardized APIs for driving hardware, accessing software: · Telephony APIs

Technical Integration issues · Standardized APIs for driving hardware, accessing software: · Telephony APIs (telecom boards + drivers) – TAPI (Microsoft) – JTAPI (Java) – TSAPI (Novell) · Speech APIs (to various pieces of speech software) – SAPI (Microsoft) [ ASAPI (AT&T) extensions ] – Java Speech API (Java) – Lots of proprietary stuff for specific speech components · Computer-Telephony CCT APIs – No common API standard (usually has a RPC component) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 35

Voice. XML Integrators · Many, many companies – Some sell integrated solutions kits –

Voice. XML Integrators · Many, many companies – Some sell integrated solutions kits – Some are voice ASPs (provide leased hosting solutions) · I am sure you don’t care about this. · However, several provide developer sites where you can – Develop and test Voice. XML applications – But, you need your own Web site for hosting the Voice. XML (and other) files · Two companies that provide developer sites and phone lines: – Tellme. com – Voice. Genie (Toronto) Emerging Business Strategy, IBS http: //studio. tellme. com http: //developer. voicegenie. com ian. graham@bmo. com / 416. 513. 5656 36

Voice. XML 1. 0 Components: · Covers various UI components, such as: – Forms

Voice. XML 1. 0 Components: · Covers various UI components, such as: – Forms (lists user can select from) and associated variables: includes menus – Links (that link to other Voice. XML documents) controlled by logic – Playing audio prompts – Produce text-to-speech prompts/dialogs (if supported) · Covers control and logic mechanisms, including – – Event handlers (including catching and throwing exceptions) Variables Resource fetching (e. g. , access an audio file to play as a prompt) Control grammars, including • DTMF (touch-tone driven - built into Voice. XML) • Voice driven (using external standards for grammars) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 37

Location of Resource Files · Files are not typically on the Voice. XML server,

Location of Resource Files · Files are not typically on the Voice. XML server, but are located remotely: Voice. XML server Documents / resources HTTP User • Voice. XML processor • Referenced via URLs Emerging Business Strategy, IBS Web server • CGI’s, servlets • application code / resource files (e. g. session. vxml, gram. jsgf) • data resources files (e. g. : voice. au, sound. aiff) ian. graham@bmo. com / 416. 513. 5656 38

Management of Resource Files · Since Voice. XML files are referenced by URI, and

Management of Resource Files · Since Voice. XML files are referenced by URI, and often located remotely, you need optimization at the Voice. XML server: – – Resource caching (store local copies) Model for cache coherency (how often to check for changes) Pre-fetch hinting (when can this be done) Timeout handling (if you can’t retrieve the resource) · When Voice. XML references a URI, the language provides attributes that define the cache model to use: – caching=“safe” or “fast” Dropped in Voice. XML 2. 0 (“safe” always goes for new version) – fetchtimeout=“secs” (how long to wait before throwing error) – fetchhint=“prefetch” or “safe” (safe only gets stuff when needed) Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 39

Voice. XML Example main. vxml <? xml version="1. 0"? > <vxml version="1. 0" application="app-root.

Voice. XML Example main. vxml <? xml version="1. 0"? > <vxml version="1. 0" application="app-root. vxml"> <form id="say_goodbye"> <field name="answer" type="boolean"> <prompt>Shall we say <value expr="application. bye"/>? </prompt> <filled> <if cond="answer"> <exit/> </if> <clear namelist="answer"/> </filled> app-root. vxml </field> <? xml version="1. 0"? > </form> <vxml version="1. 0"> </vxml> <var name="bye" expr="'Ciao'"/> <link next="operator_xfer. vxml"> <grammar> operator </grammar> </link> </vxml> From: Voice. XML 1. 0 Specification -- http: //www. w 3. org/TR/voicexml Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 40

Processing Model: The Example main. vxml app-root. vxml root main. vxml load root bootstrap

Processing Model: The Example main. vxml app-root. vxml root main. vxml load root bootstrap from main. vxml app-root. vxml main. vxml load new resource Run dialog in ‘main’ using root grammar Emerging Business Strategy, IBS operator_xfer. vxml transfer context to new dialog ian. graham@bmo. com / 416. 513. 5656 41

Voice. XML Examples main. vxml <vxml version="1. 0" application="app-root. vxml"> <form id="say_goodbye"> Default <field

Voice. XML Examples main. vxml <vxml version="1. 0" application="app-root. vxml"> <form id="say_goodbye"> Default <field name="answer" type="boolean"> system <prompt>Shall we say prompt <value expr="application. bye"/> ? </prompt> <filled> <if cond="answer"> <exit/> · S: Shall we say Ciao? ” </if> · U: “Boobie” <clear namelist="answer"/> </filled> · S: I did not understand that </field> · U: “Bleeble” </form> app-root. vxml <var name="bye" expr="'Ciao'"/> <link next="operator_xfer. vxml"> <grammar> operator </grammar> </link> Emerging Business Strategy, IBS · S: I did not understand that · U: “Operator” · S: [Transfer to operator_xfer. vxml] ian. graham@bmo. com / 416. 513. 5656 42

Example: Including recorded audio · Can do so whenever you have a prompt. Some

Example: Including recorded audio · Can do so whenever you have a prompt. Some examples: <prompt> Welcome to Birdland. <audio src=“http: //www. xxx. org/birdland. wav”> We have every record by Charlie Parker … </prompt> – Synthesized voice with audio file in the middle · <prompt> <audio src=“http: //audio. files. com/yeah. au” ></prompt> – No text - just an audio file. · <prompt> <audio src=“…”>Text alternative to audio</audio> </prompt> – Synthesized text used if audio file unavailable Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 43

Recording Audio · Can be done, but is stored locally in a variable. ·

Recording Audio · Can be done, but is stored locally in a variable. · Can’t permanently store data on the Voice. XML server -- it must be moved over to the “application/resources” server. – Typically use HTTP methods to do this. · <record> element lets you specify this, and, also indicate – – – Prompt and error response variable name for storing it ‘dead air’ time to indicate end of recording DTMF indicator for end of message How long the message can be Audio format for the data Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 44

Recording Audio <? xml version="1. 0"? > <vxml version="1. 0"> <form> <record name="greeting" beep="true"

Recording Audio <? xml version="1. 0"? > <vxml version="1. 0"> <form> <record name="greeting" beep="true" maxtime="10 s" finalsilence="4000 ms" dtmfterm="true" type="audio/wav"> <prompt> At the tone, please say your greeting. </prompt> <noinput> I heard nothing. . . </noinput> </record> <field name="confirm" type="boolean"> <prompt> Your greeting is <value expr="greeting"/>. </prompt> <prompt> To keep it, say yes. To discard it, say no. </prompt> <filled> <if cond="confirm"> <submit next="save_greeting. pl" method="post" namelist="greeting"/> </if> <clear/> </filled> </field></form> </vxml> Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 45

Voice. XML 2. 0 · Is basically a ‘cleanup’ of Voice. XML 1. 0

Voice. XML 2. 0 · Is basically a ‘cleanup’ of Voice. XML 1. 0 – Better consistency with XML architecture, including definition of a Voice. XML namespace URI ( http: //www. w 3. org/2001/vxml ) – Support for error logging, – some changes to syntax, and some deprecated elements and attributes – mandated support for a W 3 C sponsored speech grammar format – clarification of meaning, cleanup of caching control models · Basically the same language, with a few minor tweaks and improvements Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 46

JSML - JSpeech Markup Language · Developed by Sun and Speech. Works, as a

JSML - JSpeech Markup Language · Developed by Sun and Speech. Works, as a markup language for text-to-speech dialogs. · Based on the Java Speech API Markup Language http: //java. sun. com/products/java-media/speech/ · Text annotation to provide hints to speech synthesizers – Aimed at making TTS speech more natural, more understandable · Feature set: – hints to word pronunciation – hints to phrasing, emphasis, pitch and speaking rate – “marker” elements -- notifications from the speech synthesizer to applications when marker is reached. Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 47

JSML - JSpeech Grammar Format · Developed by Sun and Speech. Works, as a

JSML - JSpeech Grammar Format · Developed by Sun and Speech. Works, as a syntax for expressing speech grammars · Based on the Java Speech Grammar API Grammar Format http: //java. sun. com/products/java-media/speech/ Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 48

Future of the “Voice” web and Voice. XML Sun/Speech. Works (1999) W 3 C

Future of the “Voice” web and Voice. XML Sun/Speech. Works (1999) W 3 C Voice. XML 3? JSML Speech synthesis (SSML) JSGML, gram. langs Voice. XML forum (2000) W 3 C (2002 in draft) Voice. XML 1. 0 Voice. XML 2. 0 [? ? ] [WD] Speech reco. grammar [WD] Speech semantics [prelim] NLP [prelim] Pronunciation lexicon [early] Call control [early] Voice Browser interoperation [early] Microsoft-led (2002) SALT Speech Application Language Tags Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 49

Multiple Future XML Standards · Future standards will be targeted at the various ‘different’

Multiple Future XML Standards · Future standards will be targeted at the various ‘different’ issues associated with Voice interfaces (of which IVR is a subset) – Speech synthesis hints to appropriate spoken nature of text. Overlaps with audio Cascading Style Sheets, to some degree – call control and management of call transfers, holding, Abstraction on top of PBX controllers. – Voice browser interoperation Passing of data and ‘application context’ between different voice (or other) applications – Voice. XML 3? A specific voice-driven session management language, which will embrace these other languages as components. Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 50

Microsoft’s SALT · Speech Application Language Tags – Microsoft, Cisco, Intel, Comverse, Speech. Works,

Microsoft’s SALT · Speech Application Language Tags – Microsoft, Cisco, Intel, Comverse, Speech. Works, Philips · A “lightweight” set of tags designed to be used with HTML and XHTML to enable lightweight telephony applications driven from regular Web documents. · Targeted at supporting multimodal access · No version to look at yet …. · Draft not due until later this year. Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 51

Short Glossary · Call control – managing connectivity and routing of phone calls ·

Short Glossary · Call control – managing connectivity and routing of phone calls · CTI – Computer Telephony Integration · DTMF – Dual Tone Multi Frequency · IVR – Interactive Voice Response · JSGF – Java API speech grammar format -- A proposed standard for representing speech grammars · PSTN – Public Switched Telephone Network · SLU – Spoken Language Understanding · STT – Speech to Text (simple recognition) · SV – Speaker Verification (biometric identification) · TTS – Text to Speech (synthesis) · JSML – Java Speech Markup Language · NLU – Natural Language Understanding · PBX – Private Branch Exchange Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 52

An Introduction to XML, IVR and Voice. XML 30 January 2002 Ian GRAHAM Emerging

An Introduction to XML, IVR and Voice. XML 30 January 2002 Ian GRAHAM Emerging Business Strategy, Bank of Montreal E: <ian. graham@bmo. com> or <ian. graham@utoronto. ca> T: (416) 513. 5656 / F: (416) 513. 5590 Web: http: //www. utoronto. ca/ian/talks/ Emerging Business Strategy, IBS ian. graham@bmo. com / 416. 513. 5656 53