Voice XML and Internet Telephony Kundan Singh and

  • Slides: 22
Download presentation
Voice. XML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University {kns 10,

Voice. XML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University {kns 10, hgs}@cs. columbia. edu Joint work (in progress) with Daniel, Naho, Visda and Sean.

Overview A language for specifying voice dialogs in interactive voice response systems • Information

Overview A language for specifying voice dialogs in interactive voice response systems • Information retrieval – News, sports, traffic, stock quotes • e-business – Customer service, banking, stock trading • Notification service 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 2

PSTN based IVR Platform PSTN End user Welcome to voice mail. Press 3 to

PSTN based IVR Platform PSTN End user Welcome to voice mail. Press 3 to listen to new messages. . . • Receives incoming PSTN 5 call • Responds back with prompts • Accepts user input (DTMF or speech) • Takes action based on user input (Usually the service logic is programmed for the specific application, say weather report) 1 -212 -8545224 IVR 1 platform • Voice and telephony functions (ASR 2, TTS 3, DTMF 4) • Service logic (application specific) 18 April, 2001 [1] Interactive voice response [2] Automated speech recognition [3] Text to speech [4] Dual tone multi-frequency (touch tone) [5] Public switched telephone network Voice. XML/Kundan Singh/Columbia University 3

Decomposition Internet PSTN End user Voice gateway • Voice and telephony functions IVR platform

Decomposition Internet PSTN End user Voice gateway • Voice and telephony functions IVR platform • Voice and telephony functions (ASR, TTS, DTMF) • Service logic (application specific) 18 April, 2001 End user Web server • Service logic Voice. XML/Kundan Singh/Columbia University 4

Voice. XML PSTN End user Internet HTML End user Voice gateway VXML • Voice

Voice. XML PSTN End user Internet HTML End user Voice gateway VXML • Voice and telephony functions • Voice. XML browser Multimedia Web server DB Scripts • Service logic (CGI, servlet, JSP) Audio/ grammar Web server 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 5

Why Voice. XML • Alternative: write C/C++ application on telephony platforms ? • Separate

Why Voice. XML • Alternative: write C/C++ application on telephony platforms ? • Separate application specific service logic (HTML, Voice. XML) and User interaction (browser, IO device) • Can use existing web development tools • Can have single application for both web and voice • Can use existing infrastructure: HTTP, web servers, etc. • Programming voice services for telephony platforms 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 6

Voice. XML vs HTML • • Phone vs PC; IO phone Transport: HTTP Voice

Voice. XML vs HTML • • Phone vs PC; IO phone Transport: HTTP Voice browser vs web browser Voice. XML vs HTML form <form action=“url”> Enter your Id: <input name=‘id’> <input type=‘submit’> </form> 18 April, 2001 <form> <field name=‘id’> <prompt> Your ID, please. </prompt> </field> <block> <submit next=“url”/> </block> </form> Voice. XML/Kundan Singh/Columbia University 7

Voice. XML examples [ 1 ] <? xml version=“ 1. 0”? > <vxml version=“

Voice. XML examples [ 1 ] <? xml version=“ 1. 0”? > <vxml version=“ 1. 0”> <form> <block> <prompt> <emp>Hello</emp>, World! </prompt> </block> </form> </vxml> 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 8

Voice. XML examples [ 2 ] <form id=“weather_info”> <block>Welcome to the weather information service.

Voice. XML examples [ 2 ] <form id=“weather_info”> <block>Welcome to the weather information service. </block> <field name=“state”> <prompt>What state? </prompt> <grammar src=“state. gram” type=“application/x-jsgf”/> <catch event=“help”> Please speak the state for which you want the weather. </catch> <field> 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 9

Voice. XML examples [ 2 ] <field name=“city”> <prompt>What city? </prompt> <grammar src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20415%20289%22%3E%3C/svg%3E' data-src=“city. gram”

Voice. XML examples [ 2 ] <field name=“city”> <prompt>What city? </prompt> <grammar src=“city. gram” type=“application/x-jsgf”/> <help> Please speak the state for which you want the weather. </help> <field> <block><submit next=“/servet/weather” namelist=“city state”/> </block> </form> Grammar (city. gram): California | Illinois | New Jersey | New York 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 10

Voice. XML examples [ 3 ] <field name=“card_type”> … <grammar> visa {visa} | master

Voice. XML examples [ 3 ] <field name=“card_type”> … <grammar> visa {visa} | master [card] {mastercard} | amex {amex} | american [express] {amex} </grammar> <help>Please say Visa, Mastercard, or American Express. </help> … </field> 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 11

Voice. XML examples [ 4 ] <form> <field name=“drink”> <prompt>Would you like Coffee, Tea,

Voice. XML examples [ 4 ] <form> <field name=“drink”> <prompt>Would you like Coffee, Tea, Milk or Nothing. </prompt> <option value=“coffee”>coffee</option> <option value=“tea”>tea</option> <option value=“milk”>milk</option> <option value=“nothing”>nothing</option> </field> <block> <submit next=“http: //…/bartender. cgi” namelist=“drink”/> </block> </form> 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 12

Voice. XML examples [ 5 ] <menu> <prompt>Would you like Coffee, Tea, Milk or

Voice. XML examples [ 5 ] <menu> <prompt>Would you like Coffee, Tea, Milk or Nothing. </prompt> <choice next=“http: //…coffee. vxml”>coffee</choice> <choice next=“http: //…tea. vxml”>tea</choice> <choice next=“http: //…coffee. vxml”>milk</choice> <choice next=“http: //…blank. vxml”>nothing</choice> <nomatch count=“ 1”>I did not understand what you said. </nomatch> <nomatch count=“ 2”>Please say one of coffee, tea, milk or nothing</nomatch> <noinput>You must say something. </noinput> </menu> Alternatively: “Would you like <enumerate/>” 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 13

Form Interpretation Algorithm • Initialize variables, counters. • Main loop – Select phase: select

Form Interpretation Algorithm • Initialize variables, counters. • Main loop – Select phase: select next form – Collect phase: prompt and collect input – Process phase: process the event • Document: collection of forms • An application can use multiple documents 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 14

Voice. XML scope • Human-Machine Interaction – – Audio output (TTS, pre-recorded file) Audio

Voice. XML scope • Human-Machine Interaction – – Audio output (TTS, pre-recorded file) Audio input (Speech recognition, audio recording) Character input (DTMF) Presentation logic (scripting) • Basic Connection Control – disconnect – transfer 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 15

Application scope • • • General service logic State management Dialog generation Dialog sequencing

Application scope • • • General service logic State management Dialog generation Dialog sequencing Database operation 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 16

Voice. XML features • • • Menus, Forms, Sub-Dialogs Inputs (grammar, record, dtmf) Outputs

Voice. XML features • • • Menus, Forms, Sub-Dialogs Inputs (grammar, record, dtmf) Outputs (audio, text-to-speech) Events (error handling: nomatch, noinput, catch-throw) Variables and scripting (var, assign, if) Transition or links (goto, submit) Transfer to 3 rd party (also add third party) Disconnect the call Platform specific object, and property Pre-fetching 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 17

Voice. XML 1. 0 <tags> assign, audio, block, break, catch, choice, clear, disconnect, div,

Voice. XML 1. 0 <tags> assign, audio, block, break, catch, choice, clear, disconnect, div, dtmf, elseif, emp, enumerate, error, exit, field, filled, form, goto, grammar, help, if, initial, link, menu, meta, noinput, nomatch, object, option, param, property, pros, record, reprompt, return, sayas, script, subdialog, submit, throw, transfer, value, var, vxml Telephony, Speech Synthesis or audio output, User input and Grammar, Program flow, Variable and properties, Error handling, Misc. 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 18

Internet Telephony Internet PSTN End user Voice gateway Voice and telephony function Voice. XML

Internet Telephony Internet PSTN End user Voice gateway Voice and telephony function Voice. XML browser Web server • Service logic (CGI, servlet, JSP) 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 19

Internet Telephony New module PSTN End user Voice gateway Voice. XML browser with SIP

Internet Telephony New module PSTN End user Voice gateway Voice. XML browser with SIP PSTN/SIP user agent Web server • CGI, servlet, JSP SIP phone 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 20

Internet Telephony Voice. XML browser with SIP Web server (CGI, servlet, JSP) Example: Email

Internet Telephony Voice. XML browser with SIP Web server (CGI, servlet, JSP) Example: Email by phone, voicemail by phone, directory services for department, web browsing by phone (Not WAP), … 18 April, 2001 • Accept SIP connection • Fetch XML page over HTTP • Parse XML • Interpret Voice. XML tags • Do Text-to-speech • Receive and detect user input (DTMF, or in future speech) • Parse according to the grammer • Fetch audio file from web and play to the user. . . Voice. XML/Kundan Singh/Columbia University gateway SIP phone SIP for signaling, RTP for audio, DTMF (either inband audio tones or RFC 2833) 21

Status • Email by phone (using Tell. Me voice browser) • Voice XML browser

Status • Email by phone (using Tell. Me voice browser) • Voice XML browser - on going 18 April, 2001 Voice. XML/Kundan Singh/Columbia University 22