Dialog Design SpeechNatural Language Pen Gesture Dialog Styles

  • Slides: 58
Download presentation
Dialog Design Speech/Natural Language Pen & Gesture

Dialog Design Speech/Natural Language Pen & Gesture

Dialog Styles 1. Command languages 2. WIMP - Window, Icon, Menu, Pointer 3. Direct

Dialog Styles 1. Command languages 2. WIMP - Window, Icon, Menu, Pointer 3. Direct manipulation 4. Speech/Natural language 5. Gesture, pen

Natural input Universal design ¢ Take advantage of familiarity, existing knowledge ¢ Alternative input

Natural input Universal design ¢ Take advantage of familiarity, existing knowledge ¢ Alternative input & output ¢ Multi-modal interfaces ¢ Getting “off the desktop” ¢

Agenda Speech ¢ Natural Language ¢ Other audio ¢ PDAs & Pen input styles

Agenda Speech ¢ Natural Language ¢ Other audio ¢ PDAs & Pen input styles ¢ Gesture ¢

When to Use Speech Hands busy ¢ Mobility required ¢ Eyes occupied ¢ Conditions

When to Use Speech Hands busy ¢ Mobility required ¢ Eyes occupied ¢ Conditions preclude use of keyboard ¢ Visual impairment ¢ Physical limitation ¢

Waveform & Spectrogram ¢ Speech does not equal written language

Waveform & Spectrogram ¢ Speech does not equal written language

Parsing Sentences "I told him to go back where he came from, but he

Parsing Sentences "I told him to go back where he came from, but he wouldn't listen. "

Speech Input Speaker recognition ¢ Speech recognition ¢ Natural language understanding ¢

Speech Input Speaker recognition ¢ Speech recognition ¢ Natural language understanding ¢

Speaker Recognition ¢ Tell which person it is (voice print) ¢ Could also be

Speaker Recognition ¢ Tell which person it is (voice print) ¢ Could also be important for monitoring meetings, determining speaker

Speech Recognition ¢ Primarily identifying words Improving all the time ¢ Commercial systems: ¢

Speech Recognition ¢ Primarily identifying words Improving all the time ¢ Commercial systems: ¢ l IBM Via. Voice, Dragon Dictate, . . .

Recognition Dimensions ¢ Speaker dependent/independent l l ¢ Vocabulary l ¢ Parametric patterns are

Recognition Dimensions ¢ Speaker dependent/independent l l ¢ Vocabulary l ¢ Parametric patterns are sensitive to speaker With training (dependent) can get better Some have 50, 000+ words Isolated word vs. continuous speech l l Continuous: where words stop & begin Did you vs. Typically a pattern match, no context used. Didja

Recognition Systems ¢ Typical system has 5 components: l l l Speech capture device

Recognition Systems ¢ Typical system has 5 components: l l l Speech capture device - Has analog -> digital converter Digital Signal Processor - Gets word boundaries, scales, filters, cuts out extra stuff Preprocessed signal storage - Processed speech buffered for recognition algorithm Reference speech patterns - Stored templates or generative speech models for comparisons Pattern matching algorithm - Goodness of fit from templates/model to user’s speech

Errors ¢ Systems make four types of errors: Substitution - one for another l

Errors ¢ Systems make four types of errors: Substitution - one for another l Rejection - detected, but not recognized l Insertion - added l Deletion - not detected l ¢ Problems with recovery

Recognition Example: My flight experience uh, I started off as a private pilot and

Recognition Example: My flight experience uh, I started off as a private pilot and then I spent 20 years flying as a navigator on C 130 s in the air force and I currently fly as a first officer on the SAAB 340 turboprop. my flight experience a and I started off as a private pilot and I spent 20 years flying is a navigator on C-130 s in the Air Force and I currently fly as a first officer on the sound 340 turboprop.

And another example Here here's one of the thoughts I'd throw in on that.

And another example Here here's one of the thoughts I'd throw in on that. I'd agree with Howard that the combination of checklists and flows is the way to manage things best. hair and here's one of the outside troodon and an eye to agree with power that the domination checklists and flows is the way that managed things best

Natural Language Understanding ¢ Putting meaning to the words ¢ Input might be spoken

Natural Language Understanding ¢ Putting meaning to the words ¢ Input might be spoken or typed ¢ Holy grail of Artificial Intelligence problems

Eliza - a “Faked” Understanding ¢ ¢ ¢ Person : Men are all alike.

Eliza - a “Faked” Understanding ¢ ¢ ¢ Person : Men are all alike. ELIZA : In what way? Person : They're always bugging us about something or other. ELIZA : Can you think of a specific example? Person : Well, my boyfriend made me come here. ELIZA : Your boyfriend made you come here? Person : He says I'm depressed much of the time. ELIZA : I am sorry to hear that you are depressed. Person : It's true. I am unhappy. ELIZA : Do you think coming here will help you not to be unhappy? Person : I need some help, that much seems certain. ELIZA : What would it mean to you if you got some help? Person : Perhaps I could learn to get along with my mother. ELIZA : Tell me more about your family http: //www-ai. ijs. si/eliza. html Weizenbaum, J. , "ELIZA -- A computer program for the study of natural language communication between man and machine", Communications of the ACM 9(1): 36 -45, 1966

Natural Language Domains ¢ Conceptual: total set of objects/actions • Knows about managers and

Natural Language Domains ¢ Conceptual: total set of objects/actions • Knows about managers and salaries ¢ Functional: what can be expressed • What is the salary of Joe’s manager? Who is Mary’s manager? ¢ Syntactic: variety of forms • What is the salary of the manager of Joe? ¢ Lexical: word meanings, synonyms, vocabulary • What were the earnings of Joe’s boss

NL Factors/Terms ¢ Syntactic l ¢ Prosodic l ¢ Inflection, stress, pitch, timing Pragmatic

NL Factors/Terms ¢ Syntactic l ¢ Prosodic l ¢ Inflection, stress, pitch, timing Pragmatic l ¢ Grammar or structure Situated context of utterance, location, time Semantic l Meaning of words

SR/NLU Advantages Easy to learn and remember ¢ Powerful ¢ Fast, efficient (not always)

SR/NLU Advantages Easy to learn and remember ¢ Powerful ¢ Fast, efficient (not always) ¢ Little screen real estate ¢

SR/NLU Disadvantages Assumes domain knowledge ¢ Doesn’t work well enough yet ¢ Requires confirmation

SR/NLU Disadvantages Assumes domain knowledge ¢ Doesn’t work well enough yet ¢ Requires confirmation l And recognition will always be errorprone l Expensive to implement ¢ Unrealistic expectations ¢ Generate mistrust/anger ¢

Speech Output ¢ ¢ Tradeoffs in speed, naturalness and understandability Male or female voice?

Speech Output ¢ ¢ Tradeoffs in speed, naturalness and understandability Male or female voice? l l ¢ Rate of speech l l ¢ Technical issues (freq. response of phone) User preference (depends on the application) Technically up to 550 wpm! Depends on listener Synthesized or Pre-recorded? l l Synthesized: Better coverage, flexibility Recorded: Better quality, acceptance

Speech Output ¢ Synthesis l Quality depends on software ($$) Influence of vocabulary and

Speech Output ¢ Synthesis l Quality depends on software ($$) Influence of vocabulary and phrase choices l http: //www. research. att. com/projects/tts/demo. html l ¢ Recorded segments l l Store tones, then put them together The transitions are difficult (e. g. , numbers)

Designing the Interaction ¢ Constrain vocabulary Limit valid commands l Structure questions wisely (Yes/No)

Designing the Interaction ¢ Constrain vocabulary Limit valid commands l Structure questions wisely (Yes/No) l Manage the interaction l Examples? l Slow speech rate, but concise phrases ¢ Design for failsafe error recovery ¢ Visual record of input/output ¢ Design for the user – Wizard of Oz ¢

Speech Tools/Toolkits ¢ Java Speech SDK l ¢ ¢ Free. TTS 1. 1. 1

Speech Tools/Toolkits ¢ Java Speech SDK l ¢ ¢ Free. TTS 1. 1. 1 http: //freetts. sourceforge. net/docs/index. php IBM Java. Beans for speech Microsoft speech SDK (Visual Basic, etc. ) OS capabilities (speech recognition and synthesis built in to OS) (Text. Edit) Voice. XML

General Issues in Choosing Dialogue Style ¢ ¢ ¢ ¢ ¢ Who is in

General Issues in Choosing Dialogue Style ¢ ¢ ¢ ¢ ¢ Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required

Non-speech audio Traditionally used for warnings, alarms or status information ¢ Sounds provide information

Non-speech audio Traditionally used for warnings, alarms or status information ¢ Sounds provide information that help reduce error. Eg: typing, video games ¢ Multi-modal interfaces ¢

Additional benefits of nonspeech audio Good for indicating changes, since we ignore continuous sounds

Additional benefits of nonspeech audio Good for indicating changes, since we ignore continuous sounds ¢ Provides secondary representation ¢ l ¢ Supports visual interface Tradeoff in using natural (real) sounds vs. synthesized noises.

Non-speech audio examples Error ding ¢ Info beep ¢ Email arriving ding ¢ Recycle

Non-speech audio examples Error ding ¢ Info beep ¢ Email arriving ding ¢ Recycle ¢ Battery critical ¢ Logoff ¢ Logon Others? ¢

Pen and Gesture

Pen and Gesture

PDAs Becoming more common and widely used ¢ Smaller display (160 x 160), (320

PDAs Becoming more common and widely used ¢ Smaller display (160 x 160), (320 x 240) ¢ Few buttons, interact through pen ¢ Estimate: 14 million shipped by 2004 ¢ Improvements ¢ l ¢ Wireless, color, more memory, better CPU, better OS Palmtop versus Handheld

No Shredder…

No Shredder…

http: //www. vaio. sony. co. jp/Products/VGN-U 50/ http: //sonyelectronics. sonystyle. com/micros/clie/ http: //www. oqo.

http: //www. vaio. sony. co. jp/Products/VGN-U 50/ http: //sonyelectronics. sonystyle. com/micros/clie/ http: //www. oqo. com/ http: //www. blackberry. com/

Input ¢ ¢ Pen is dominant form for PDA and Tablet Main techniques l

Input ¢ ¢ Pen is dominant form for PDA and Tablet Main techniques l l l ¢ Free-form ink Soft keyboard Numeric keyboard => text Stroke recognition - strokes not in the shape of characters Hand printing / writing recognition Sometimes can connect keyboard or has modified keyboard (e. g. cell phones)

Soft Keyboards ¢ Common on PDAs and mobile devices l Tap on buttons on

Soft Keyboards ¢ Common on PDAs and mobile devices l Tap on buttons on screen

Soft Keyboard Presents a small diagram of keyboard ¢ You click on buttons/keys with

Soft Keyboard Presents a small diagram of keyboard ¢ You click on buttons/keys with pen ¢ QWERTY vs. alphabetical ¢ Tradeoffs? l Alternatives? l

Numeric Keypad -T 9 ¢ Tegic Communications developed You press out letters of your

Numeric Keypad -T 9 ¢ Tegic Communications developed You press out letters of your word, it matches the most likely word, then gives optional choices Faster than multiple presses per key Used in mobile phones ¢ http: //www. t 9. com/ ¢ ¢ ¢

Cirrin - Stroke Recognition Developed by Jen Mankoff (GT -> Berkeley CS Faculty ->

Cirrin - Stroke Recognition Developed by Jen Mankoff (GT -> Berkeley CS Faculty -> CMU CS Faculty) ¢ Word-level unistroke technique ¢ UIST ‘ 98 paper ¢ Use stylus to go from one letter to the next -> ¢

Quikwriting - Stroke Recogntion ¢ Developed by Ken Perlin

Quikwriting - Stroke Recogntion ¢ Developed by Ken Perlin

Quikwriting Example p l e Said to be as fast as graffiti, but have

Quikwriting Example p l e Said to be as fast as graffiti, but have to learn more http: //mrl. nyu. edu/~perlin/demos/Quikwrite 2_0. html

Hand Printing / Writing Recognition ¢ ¢ Recognizing letters and numbers and special symbols

Hand Printing / Writing Recognition ¢ ¢ Recognizing letters and numbers and special symbols Lots of systems (commercial too) English, kanji, etc. Not perfect, but people aren’t either! l l ¢ People - 96% handprinted single characters Computer - >97% is really good OCR (Optical Character Recognition)

Recognition Issues ¢ Off-line vs. On-line l Off-line: After all writing is done, speed

Recognition Issues ¢ Off-line vs. On-line l Off-line: After all writing is done, speed not an issue, only quality. • Work with either a bit map or vector sequence l On-line: Must respond in real-time but have richer set of features acceleration, velocity, pressure

More Issues ¢ Boxed vs. Free-Form input l ¢ Printed vs. Cursive l ¢

More Issues ¢ Boxed vs. Free-Form input l ¢ Printed vs. Cursive l ¢ Sometimes encounter boxes on forms Cursive is much more difficult to impossible Letters vs. Words l Cursive is easier to do in words vs individual letters, as words create more context

More Issues ¢ Using context & words can help Usually requires existence of a

More Issues ¢ Using context & words can help Usually requires existence of a dictionary l Check to see if word exists l Consider 1 vs. I vs. l l ¢ Training - Many systems improve a lot with training data

Special Alphabets ¢ Graffiti - Unistroke alphabet on Palm PDA l ¢ What are

Special Alphabets ¢ Graffiti - Unistroke alphabet on Palm PDA l ¢ What are your experiences with Graffiti? Other alphabets or purposes l Gestures for commands

Pen Gesture Commands -Might mean delete -Insert -Paragraph Define a series of (hopefully) simple

Pen Gesture Commands -Might mean delete -Insert -Paragraph Define a series of (hopefully) simple drawing gestures that mean different commands in a system

Pen Use Modes Often, want a mix of free-form drawing and special commands ¢

Pen Use Modes Often, want a mix of free-form drawing and special commands ¢ How does user switch modes? ¢ Mode icon on screen l Button on pen l Button on device l

Error Correction ¢ Having to correct errors can slow input tremendously ¢ Strategies Erase

Error Correction ¢ Having to correct errors can slow input tremendously ¢ Strategies Erase and try again (repetition) l When uncertain, system shows list of best guesses (n-best list) l Others? ? l

More ink applications Signature verification ¢ Notetaking ¢ Electronic whiteboards and largescale displays ¢

More ink applications Signature verification ¢ Notetaking ¢ Electronic whiteboards and largescale displays ¢ Sketching ¢

Free-form Ink ¢ Ink is the data, take as is Human is responsible for

Free-form Ink ¢ Ink is the data, take as is Human is responsible for understanding and interpretation ¢ Like a sketch pad ¢ Often time-stamped ¢

Audio Notebook Stifelman, MIT affordances of paper notetaking activity indexing scanning

Audio Notebook Stifelman, MIT affordances of paper notetaking activity indexing scanning

Meeting Support ¢ ¢ Natural Input Indexing Tivoli Domain objects

Meeting Support ¢ ¢ Natural Input Indexing Tivoli Domain objects

e. Class ¢ Ubiquitous Access

e. Class ¢ Ubiquitous Access

Flatland ¢ ¢ Support individual whiteboard use Use study l l l ¢ persistence

Flatland ¢ ¢ Support individual whiteboard use Use study l l l ¢ persistence segments Informal Mynatt, E. D. , Igarashi, T. , Edward, W. K. , and La. Marca, A. (1999). "Flatland: New dimensions in office whiteboards. " In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999; Pittsburgh, Pennsylvania). New York: ACM Press, pp. 346 -353.

Example ¢ DENIM – Landay, Berkeley

Example ¢ DENIM – Landay, Berkeley

General Issues in Choosing Dialogue Style ¢ ¢ ¢ ¢ ¢ Who is in

General Issues in Choosing Dialogue Style ¢ ¢ ¢ ¢ ¢ Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required

Gesture Recognition Tracking 3 D hand-arm gestures fiber optic, e. g. dataglove magnetic tracker,

Gesture Recognition Tracking 3 D hand-arm gestures fiber optic, e. g. dataglove magnetic tracker, e. g. Polhemus Perceptual user interfaces emerging area mainly computer vision researchers

Other interesting interactions ¢ 3 D interaction l ¢ Virtual reality l ¢ Stereoscopic

Other interesting interactions ¢ 3 D interaction l ¢ Virtual reality l ¢ Stereoscopic displays Immersive displays such as glasses, caves Augmented reality l Head trackers and vision based tracking