Computer SpeechVoice Recognition IBM Via Voice April 2005
Computer Speech/Voice Recognition - IBM Via. Voice® April 2005 IBM PC Club Bernhard Krevet, IBC, Napa
Overview • • Definitions Categories of speech recognition software Products: Dragon Naturally. Speaking, Via. Voice® by IBM for Windows & Mac – System Requirements 2003 – System Requirements 1994 & 1997 – Installation experience • Resources (Web) & Comments • Using Speech Recognition • Demo
Speech Recognition (1/3) q. . . refers to the process by which a person dictates a phrase that the computer translates into typed text. The dictated words can be interpreted as a command or stored as the words in a document.
Speech Recognition (2/3) q What It Does ƒ ƒ ƒ Transform spoken words into written text or commands Recognize context (e. g. differentiate homonyms) Learn from you Use personal voice model Extensible vocabularies Support many languages
Speech Recognition (3/3) q What It Does Not ƒ ƒ ƒ Accept more than one person talking at the same time "Understand" Think / Create ideas Organize Replace a secretary
Categories of speech recognition software (1/2): v Continuous speech, which means speaking words without pauses in between. It's not quite "natural, " but it's close. • Natural user interface, with things like natural language commands. • Short training period , this means the software makers are looking for Instead of using a set of specified commands, you would say what you want, and the computer would take the appropriate action. The programs available today aren't fully "natural" yet, especially since usually they let you be "natural" only in certain applications. what is known as "speaker independence. " The hope is that someday you'll be able to sit down at a strange computer and tell it what to do, or to record somebody and then have the computer do the transcribing. v Discrete speech dictation (pause between words)
Categories of speech recognition software (2/2): v Programs geared toward specific tasks • Speech-enabled PC apps recognize commands • Voice. Pilot, • Easy. Voice, • ASR (Automatic Speech Recognition) Platforms: v Windows v Macintosh v Unix v OS/2 (comes with IBM's discrete speech engine)
Popular Speech Recognition Products Talk To Me !
“Dragon Naturally. Speaking 7 is the most accurate and fullfeatured Dragon Naturally. Speaking ever released! Accuracy up to 99%! 15% Accuracy Improvement. Breakthroughs in speech engine technology deliver the largest single accuracy improvement ever for a Dragon Naturally. Speaking release. ” PC Magazine - May 2003: "Scan. Soft's Dragon Naturally. Speaking Preferred 7 makes dictation, correction, and voice control of your PC faster and easier than any voice recognition software yet. " ". . . the new autopunctuation option worked admirably at adding commas and period to our dictations; it should be ideal for casual dictation such as e-mail or online chat. "
Via. Voice Characteristics IBM Via. Voice® technology, available on the Windows, Macintosh and handheld computer platforms, can afford a 'multi-modal' environment, freeing users from dependence on the mouse, keyboard and stylus for many applications. Via. Voice® personal computer software leverages generations of IBM voice recognition research and accomplishment. Via. Voice for Windows Release 10 product family offers a complete portfolio appealing to every level of user expertise, and our Via. Voice for Mac offerings were the first continuous speech products on the Apple Macintosh platforms in the consumer marketplace.
µ Windows products v Pro USB Edition: Flagship edition, featuring a digitally-enhanced stereo headset microphone. v Advanced Edition: Productivity tool with new command control features. v Standard Edition: Great dictation accuracy for the home/home office. v Personal Edition: Introduction to natural, continuous speech recognition on the PC µ Macintosh products v v Via. Voice for Mac OS X Edition with the sleek "Aqua" look and feel Simply Dictation for Mac OS X Introduction to dictation on the Mac
Via. Voice® Pro USB Windows ƒ System Requirements (2003) ƒ ƒ ƒ ƒ > 300 MHz processor, > 128 MB RAM 500 MB available hard drive space Sound card with microphone jack, USB CD ROM drive (for installation) Windows 98 SE, Me or XP MS Office for direct input or Any word processor with access to the clipboard (copy/paste) MSRP: $200 incl. Headset ($100 upgrade)
Components / Prerequisites -1996 Hardware: – Pentium/100 MHz processor, 24 MB RAM – Any sound card Software: – IBM's OS/2 WARP 4. 0 ($189) which included: ƒ OS/2 Speech Recognition SW ƒ Headset Microphone with ANC – IBM's "Simply Speaking" for Windows 95 ($600) – Any word processor or editor with access to the clipboard (copy/paste)
Components / Prerequisites -1994 Hardware: § 486/33 MHz processor, 65 MB disk space § IBM Voice. Type Dictation adapter ISA, MC, PCMCIA § Unidirectional microphone § Powered speakers or headphones Software – IBM Voice. Type Dictation Program Product – Any word processor or editor with access to the clipboard (copy/paste) Price: ~ $1000. 00 for VTD HW and SW
Via. Voice® Setup § Installation – – SW installation Registration of each user / language § Training – Human: About 90 min reading predefined texts – Computer: About 30 min processing of personal language voice model
Via. Voice® Installation Experience January 2005 ü Installation of two languages, must be same version (USB Pro) ü 560 MB hard drive space ü Headset on phone jacks or USB ü Check audio levels and record sample texts to build voice models ü First dictation – many errors, need to improve voice model ü Check your voice, drink water… initially tedious correction process ü Web-advice: use special Speak. Pad (not Word) with open correction window ü Learn how to use the Correction Window ü File (save) sessions to give program a chance to improve the model ü Some idiosyncrasies e. g “OPEN-QUOTE” “CLOSE-QUOTE” ü Analyze existing documents to add specific words to vocabulary (only supports. doc &. txt, not even IBM Lotus own Word. Pro. lpw) ü Manage vocabulary - OK
Voice Recognition Sites – Most Popular (Yahoo) Lernout & Hauspie - provider of speech and language products, technologies, and services, including speech recognition, text to speech, compression, and translation. Dragon Naturally. Speaking - family of software products that turn speech into text. Nuance Communications - provides enterprise-level speech recognition and speaker verification software to automate v-commerce and communications transactions. General Magic - voice infrastructure software company that provides enterprise-class software and supporting voice dialog design and hosting services. Speech. Works International - provider of speech recognition, text-to-speech (TTS), and speaker verification for network and embedded environments. Philips Speech Processing - large vocabulary continuous speech recognition products for PCs. Also Digital Dictation devices and solutions for the medical and legal area. Sensory, Inc. - low-cost integrated circuit providing speech recognition, speech synthesis, music synthesis and 8 -bit micro controller. IBM Voice Systems - offering the Via. Voice line of speech recognition software. Fonix Corporation - voice recognition technology featuring automatic speech recognition (ASR) using neural network (artificial intelligence) techniques. Conversá - develops speech-enabled software and hardware that allows users a conversational way of interacting with their computers.
http: //www. out-loud. com/ This site is intended to help people using speech recognition software, whatever the variety, and to do so without the filters of vendors. We have our own filters, of course, so please read critically. By Susan Fulton, longtime user of speech recognition and assorted gadgets for easier, less painful computing.
http: //www. voicerecognition. net/ List established in January 1996 for discussing all aspects of using voice recognition input systems. The focus is on effective use of voice recognition. Sample topics: • Using such systems safely, without muscle or voice strain; • Techniques for improving recognition accuracy; • How to set up the physical voice workstation optimally; • General tips for effective use of voice interfaces; • Configuration of specific systems, troubleshooting, etc.
Voice Recognition Dead End? You've heard of killer apps? How about an app killer? This is what voice recognition has become over the years, because for the most part, it doesn't work. I see no evidence that it ever will, at least not in the sense that we can achieve true voice dictation capability. [. . . ] John C. Dvorak, January 25, 2002
Using Speech Recognition Ø The Verdict Is in: Speech Recognition Software Can Increase Productivity in the Legal Environment Mobile Computing Ø The Use of Speech Technologies in Ø Creating Access for People and Language Technologies Ø Voice Recognition Software Versus a Traditional Transcription Service for with Disabilities through Speech Physician Charting in the ED Ø From RSI to ROI: Speech Recognition Can Increase Productivity and Ø Automating the Ø Leading Electronic Medical Records Program Incorporates Dragon Naturally. Speaking® Speech Recognition Ø Dragon Naturally. Speaking® Speech Recognition Benefits Not Only Reduce Repetitive Stress Injuries in Your Workplace Electronic Forms Process Doctors Who Use it, but Patients Who Rely on Them
CONVERSATION Q. What is one divided by zero. A. Error in line above: malformed? Q. What is one divided by zero? A. Error in line above: attempting division by zero. Q. What is one divided by one. A. Error in line above: malformed? Q. What is one divided by one? A. One. Problem trivial for my capacity. Please consult supervisor before typing in further instructions. Q. Why? A. Question not quantified or answerable by "yes" or "no. " This is fourth malformed or erroneous question typed in during your time interval. Consult supervisor about formation of instructions. Cost effectiveness ratio for this machine prohibits further use by this operator until recycling code is typed in. Sorry. Q. Why? A. KEYBOARD NOW BLOCKED.
Via. Voice® Demonstration Talk To Me !
- Slides: 24