confidence measures for speech recognition in Euro Speech

Reference o o o [1] B. Dong, Q. Zhao and Y. Yan, “A fast

Introduction (1/4) o o How to maintain and/or improve ASR performance in real-field conditions

Introduction (2/4) o o In this area, researchers have proposed to compute a score

Introduction (3/4) o o Young(1994) first elucidated how to use the posterior probability as

Introduction (4/4) o General speaking, all methods proposed for computing CMs can be roughly

[1] method (1/4) o o in this paper, posterior probability of each state is

[1] method (2/4) o o The posterior probability of each phoneme : Traditional two-pass

[1] method (3/4) o One-pass synchronous calculating algorithm for CM 10/3/2020 NTNU SPEECH LAB

[1] method (4/4) 10/3/2020 NTNU SPEECH LAB 10

[1] experiments (1/2) o o The task is to evaluate name recognition accuracy with

[1] experiments (2/2) 10/3/2020 NTNU SPEECH LAB 12

[2] method (1/5) o Word graph sparseness in CFG based ASR 10/3/2020 NTNU SPEECH

[2] method (2/5) o However, in some CFG constrained ASR application, the lexical and

[2] method (3/5) o o To alleviate this graph sparseness problem Based on background

[2] method (4/5) o o For each arc in the background model graph Finally,

[2] method (5/5) o Background model selection n Phoneme based background models o o

[2] experiment (1/4) o The CFG are built with all legal phrases arranged in

[2] experiment (2/4) o The rejection performance 10/3/2020 NTNU SPEECH LAB 19

[2] experiment (3/4) o Recognition tests o Syllable set selection in English 10/3/2020 NTNU

[2] experiment (4/4) o Rejection tests : in MPP tests, the background model graphs

[3] Method (1/4) o Typical spoken language systems consist : n n 10/3/2020 An

[3] Method (2/4) o In-domain confidence 10/3/2020 NTNU SPEECH LAB 23

[3] Method (3/4) o o Discourse coherence : based on topic consistency across consecutive

[3] Method (4/4) o Joint confidence by combining multiple measures 10/3/2020 NTNU SPEECH LAB

[3] Experiments (1/2) o The performance was evaluated on spontaneous dialogue via the ATR

[3] Experiments (2/2) 10/3/2020 NTNU SPEECH LAB 27

Slides: 27

Download presentation

confidence measures for speech recognition in Euro. Speech 2005 Reporter : CHEN TZAN HWEI NTNU SPEECH LAB

Reference o o o [1] B. Dong, Q. Zhao and Y. Yan, “A fast confidence measure algorithm for continuous speech recognition” [2]P Liu, Y. Tian, J. -L. Zhou and F. K. Soong, “Background model based posterior probability for measuring confidence” [3]I. R Lane and T. Kawahara, “Utterance Verification Incorporating In-Domain Confidence and Discourse Measures. ” 10/3/2020 NTNU SPEECH LAB 2

Introduction (1/4) o o How to maintain and/or improve ASR performance in real-field conditions has been extensively studied in speech community. It is extremely important to able to make an appropriate and reliable judgement based on the error-prone ASR result. 10/3/2020 NTNU SPEECH LAB 3

Introduction (2/4) o o In this area, researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM) to indicate reliability of any recognition decision made by ASR system. First of all, we can backtrack some early research on CM to rejection in word-spotting systems. 10/3/2020 NTNU SPEECH LAB 4

Introduction (3/4) o o Young(1994) first elucidated how to use the posterior probability as a confidence measure for speech recognition. Under a different name, namely utterance verification , Rose et al. (1995) first formally cast the CM problem in speech recognition as a statistical hypothesis testing problem. 10/3/2020 NTNU SPEECH LAB 5

Introduction (4/4) o General speaking, all methods proposed for computing CMs can be roughly classified into three major categories. n Predictor features. n Posterior probability n Utterance verification (UV) 10/3/2020 NTNU SPEECH LAB 6

[1] method (1/4) o o in this paper, posterior probability of each state is used as the feature for the confidence measure During decoding, the time information at each state can be retrieve, the posterior probability is normalized 10/3/2020 NTNU SPEECH LAB 7

[1] method (2/4) o o The posterior probability of each phoneme : Traditional two-pass method for calculating confidence 10/3/2020 NTNU SPEECH LAB 8

[1] method (3/4) o One-pass synchronous calculating algorithm for CM 10/3/2020 NTNU SPEECH LAB 9

[1] method (4/4) 10/3/2020 NTNU SPEECH LAB 10

[1] experiments (1/2) o o The task is to evaluate name recognition accuracy with a test set size of 1278 names. In the test set, 180 names are out-ofdomain utterances. 10/3/2020 NTNU SPEECH LAB 11

[1] experiments (2/2) 10/3/2020 NTNU SPEECH LAB 12

[2] method (1/5) o Word graph sparseness in CFG based ASR 10/3/2020 NTNU SPEECH LAB 13

[2] method (2/5) o However, in some CFG constrained ASR application, the lexical and language model constraints can limit the number of hypotheses. 10/3/2020 NTNU SPEECH LAB 14

[2] method (3/5) o o To alleviate this graph sparseness problem Based on background model graph, Model based Posterior Probability (MPP) can be calculated. 10/3/2020 NTNU SPEECH LAB 15

[2] method (4/5) o o For each arc in the background model graph Finally, MPP is normalized by the total number of 10/3/2020 NTNU SPEECH LAB 16

[2] method (5/5) o Background model selection n Phoneme based background models o o n Syllable based background models o o 10/3/2020 40 phonemes in English There around 70 toneless initials and finals in Chinese In Chinese, there are only about 400 syllables. For English, the number of syllables exceeds 15, 000 NTNU SPEECH LAB 17

[2] experiment (1/4) o The CFG are built with all legal phrases arranged in parallel 10/3/2020 NTNU SPEECH LAB 18

[2] experiment (2/4) o The rejection performance 10/3/2020 NTNU SPEECH LAB 19

[2] experiment (3/4) o Recognition tests o Syllable set selection in English 10/3/2020 NTNU SPEECH LAB 20

[2] experiment (4/4) o Rejection tests : in MPP tests, the background model graphs were generated by a decoder with 4 tokes. 10/3/2020 NTNU SPEECH LAB 21

[3] Method (1/4) o Typical spoken language systems consist : n n 10/3/2020 An ASR front-end A NLP back-end NTNU SPEECH LAB 22

[3] Method (2/4) o In-domain confidence 10/3/2020 NTNU SPEECH LAB 23

[3] Method (3/4) o o Discourse coherence : based on topic consistency across consecutive utterances We adopt an inter-utterance distance based on the topic consistency between two utterance 10/3/2020 NTNU SPEECH LAB 24

[3] Method (4/4) o Joint confidence by combining multiple measures 10/3/2020 NTNU SPEECH LAB 25

[3] Experiments (1/2) o The performance was evaluated on spontaneous dialogue via the ATR speechto-speech translation system. 10/3/2020 NTNU SPEECH LAB 26

[3] Experiments (2/2) 10/3/2020 NTNU SPEECH LAB 27