ASRA Automatic Speech Recognition Assessment J S Roger

  • Slides: 11
Download presentation
ASRA: Automatic Speech Recognition & Assessment J. -S. Roger Jang (張智星) jang@mirlab. org http:

ASRA: Automatic Speech Recognition & Assessment J. -S. Roger Jang (張智星) jang@mirlab. org http: //mirlab. org/jang MIR Lab, CSIE Dept. National Taiwan University

Introduction to ASRA: Automatic speech recognition & assessment Functionality � Speech assessment or speech

Introduction to ASRA: Automatic speech recognition & assessment Functionality � Speech assessment or speech scoring � Voice-command-based speech recognition Languages � Mandarin, English, Taiwanese, Japanese Required toolboxes � Utility toolbox � SAP toolbox � ASR toolbox 2/13

Examples of Speech Assessment Test examples � sa. English 01. m Click to play

Examples of Speech Assessment Test examples � sa. English 01. m Click to play each phone! Word score � sa. Chinese 01. m � sa. Taiwanese 01. m � go. Sa. Demo. m Applications or 背書機 (Recital machine) � Read & Say � 唸唸不忘 Phone score Pitch curve 3/13

Approach to Speech Assessment Approach to speech assessment � Text to phonetic alphabets �

Approach to Speech Assessment Approach to speech assessment � Text to phonetic alphabets � Forced alignment � Phone-based scoring � Pitch tracking 4/13

Texts to Phonetic Alphabets (1/2) Chinese � Exhaustive method 朝(ㄓㄠ )辭白(ㄅㄞˊ)帝彩雲間 朝(ㄓㄠ )辭白(ㄅㄛˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄞˊ)帝彩雲間

Texts to Phonetic Alphabets (1/2) Chinese � Exhaustive method 朝(ㄓㄠ )辭白(ㄅㄞˊ)帝彩雲間 朝(ㄓㄠ )辭白(ㄅㄛˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄞˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄛˊ)帝彩雲間 � Word segmentation 基隆廟口吃小吃 三人參加會議 Taiwanese � No text, no pronunciation dictionary, no word corpus Everything is much harder! 5/13

Texts to Phonetic Alphabets (2/2) English � Exhaustive method based on CMU pronouncing dictionary

Texts to Phonetic Alphabets (2/2) English � Exhaustive method based on CMU pronouncing dictionary Multimedia � Grapheme-to-phoneme conversion: The process of using machine learning or statistical approaches to generate the most probably phone list for a word not in the pronunciation dictionary Arnold Schwarzenegger Genre classification Japanese � Exhaustive search 6/13

Forced Alignment Align given utterance to a sequence of phones represented as a lexicon

Forced Alignment Align given utterance to a sequence of phones represented as a lexicon net Lexicon net for “What are you allergic to” Heteronym (破音字) Optional silence 7/13

Error Pattern Detection To detect utterances which start/stop anywhere: 9/13

Error Pattern Detection To detect utterances which start/stop anywhere: 9/13

Scoring Computation Phone-based scoring � Identify the interval of each phone by forced alignment

Scoring Computation Phone-based scoring � Identify the interval of each phone by forced alignment � Compare the phone utterance to its competing phone models to get a ranking, and the ranking is converted to a score Example: The 38 competing phone models of “w+uh” are k+uh g+uh l+uh b+uh p+uh t+uh w+uh d+uh jh+uh f+uh sh+uh hh+uh y+uh ch+uh r+uh zh+uh th+uh n+uh z+uh er+uh ey+uh m+uh ih+uh ae+uh aw+uh iy+uh eh+uh ao+uh uw+uh ay+uh ah+uh oy+uh aa+uh v+uh ow+uh s+uh ng+uh. The 0 -based ranking of “w+uh” is converted to a score between 0 and 100. Higher-level scoring � Word score: Time weighted average of phone scores � Sentence score: Time weighted average of word scores, with discount factors derived from unusually short/long phones 10/13

Examples of Voice Command Recognition Examples � go. Vc. Demo. m Applications (Speechenabled Chinese

Examples of Voice Command Recognition Examples � go. Vc. Demo. m Applications (Speechenabled Chinese idiom relay) � 一語中的 (Speechenabled Chinese idiom riddle) � 成語接龍 No optional silence between words 11/13