confidence measures for speech recognition in Euro Speech

  • Slides: 27
Download presentation
confidence measures for speech recognition in Euro. Speech 2005 Reporter : CHEN TZAN HWEI

confidence measures for speech recognition in Euro. Speech 2005 Reporter : CHEN TZAN HWEI NTNU SPEECH LAB

Reference o o o [1] B. Dong, Q. Zhao and Y. Yan, “A fast

Reference o o o [1] B. Dong, Q. Zhao and Y. Yan, “A fast confidence measure algorithm for continuous speech recognition” [2]P Liu, Y. Tian, J. -L. Zhou and F. K. Soong, “Background model based posterior probability for measuring confidence” [3]I. R Lane and T. Kawahara, “Utterance Verification Incorporating In-Domain Confidence and Discourse Measures. ” 10/3/2020 NTNU SPEECH LAB 2

Introduction (1/4) o o How to maintain and/or improve ASR performance in real-field conditions

Introduction (1/4) o o How to maintain and/or improve ASR performance in real-field conditions has been extensively studied in speech community. It is extremely important to able to make an appropriate and reliable judgement based on the error-prone ASR result. 10/3/2020 NTNU SPEECH LAB 3

Introduction (2/4) o o In this area, researchers have proposed to compute a score

Introduction (2/4) o o In this area, researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM) to indicate reliability of any recognition decision made by ASR system. First of all, we can backtrack some early research on CM to rejection in word-spotting systems. 10/3/2020 NTNU SPEECH LAB 4

Introduction (3/4) o o Young(1994) first elucidated how to use the posterior probability as

Introduction (3/4) o o Young(1994) first elucidated how to use the posterior probability as a confidence measure for speech recognition. Under a different name, namely utterance verification , Rose et al. (1995) first formally cast the CM problem in speech recognition as a statistical hypothesis testing problem. 10/3/2020 NTNU SPEECH LAB 5

Introduction (4/4) o General speaking, all methods proposed for computing CMs can be roughly

Introduction (4/4) o General speaking, all methods proposed for computing CMs can be roughly classified into three major categories. n Predictor features. n Posterior probability n Utterance verification (UV) 10/3/2020 NTNU SPEECH LAB 6

[1] method (1/4) o o in this paper, posterior probability of each state is

[1] method (1/4) o o in this paper, posterior probability of each state is used as the feature for the confidence measure During decoding, the time information at each state can be retrieve, the posterior probability is normalized 10/3/2020 NTNU SPEECH LAB 7

[1] method (2/4) o o The posterior probability of each phoneme : Traditional two-pass

[1] method (2/4) o o The posterior probability of each phoneme : Traditional two-pass method for calculating confidence 10/3/2020 NTNU SPEECH LAB 8

[1] method (3/4) o One-pass synchronous calculating algorithm for CM 10/3/2020 NTNU SPEECH LAB

[1] method (3/4) o One-pass synchronous calculating algorithm for CM 10/3/2020 NTNU SPEECH LAB 9

[1] method (4/4) 10/3/2020 NTNU SPEECH LAB 10

[1] method (4/4) 10/3/2020 NTNU SPEECH LAB 10

[1] experiments (1/2) o o The task is to evaluate name recognition accuracy with

[1] experiments (1/2) o o The task is to evaluate name recognition accuracy with a test set size of 1278 names. In the test set, 180 names are out-ofdomain utterances. 10/3/2020 NTNU SPEECH LAB 11

[1] experiments (2/2) 10/3/2020 NTNU SPEECH LAB 12

[1] experiments (2/2) 10/3/2020 NTNU SPEECH LAB 12

[2] method (1/5) o Word graph sparseness in CFG based ASR 10/3/2020 NTNU SPEECH

[2] method (1/5) o Word graph sparseness in CFG based ASR 10/3/2020 NTNU SPEECH LAB 13

[2] method (2/5) o However, in some CFG constrained ASR application, the lexical and

[2] method (2/5) o However, in some CFG constrained ASR application, the lexical and language model constraints can limit the number of hypotheses. 10/3/2020 NTNU SPEECH LAB 14

[2] method (3/5) o o To alleviate this graph sparseness problem Based on background

[2] method (3/5) o o To alleviate this graph sparseness problem Based on background model graph, Model based Posterior Probability (MPP) can be calculated. 10/3/2020 NTNU SPEECH LAB 15

[2] method (4/5) o o For each arc in the background model graph Finally,

[2] method (4/5) o o For each arc in the background model graph Finally, MPP is normalized by the total number of 10/3/2020 NTNU SPEECH LAB 16

[2] method (5/5) o Background model selection n Phoneme based background models o o

[2] method (5/5) o Background model selection n Phoneme based background models o o n Syllable based background models o o 10/3/2020 40 phonemes in English There around 70 toneless initials and finals in Chinese In Chinese, there are only about 400 syllables. For English, the number of syllables exceeds 15, 000 NTNU SPEECH LAB 17

[2] experiment (1/4) o The CFG are built with all legal phrases arranged in

[2] experiment (1/4) o The CFG are built with all legal phrases arranged in parallel 10/3/2020 NTNU SPEECH LAB 18

[2] experiment (2/4) o The rejection performance 10/3/2020 NTNU SPEECH LAB 19

[2] experiment (2/4) o The rejection performance 10/3/2020 NTNU SPEECH LAB 19

[2] experiment (3/4) o Recognition tests o Syllable set selection in English 10/3/2020 NTNU

[2] experiment (3/4) o Recognition tests o Syllable set selection in English 10/3/2020 NTNU SPEECH LAB 20

[2] experiment (4/4) o Rejection tests : in MPP tests, the background model graphs

[2] experiment (4/4) o Rejection tests : in MPP tests, the background model graphs were generated by a decoder with 4 tokes. 10/3/2020 NTNU SPEECH LAB 21

[3] Method (1/4) o Typical spoken language systems consist : n n 10/3/2020 An

[3] Method (1/4) o Typical spoken language systems consist : n n 10/3/2020 An ASR front-end A NLP back-end NTNU SPEECH LAB 22

[3] Method (2/4) o In-domain confidence 10/3/2020 NTNU SPEECH LAB 23

[3] Method (2/4) o In-domain confidence 10/3/2020 NTNU SPEECH LAB 23

[3] Method (3/4) o o Discourse coherence : based on topic consistency across consecutive

[3] Method (3/4) o o Discourse coherence : based on topic consistency across consecutive utterances We adopt an inter-utterance distance based on the topic consistency between two utterance 10/3/2020 NTNU SPEECH LAB 24

[3] Method (4/4) o Joint confidence by combining multiple measures 10/3/2020 NTNU SPEECH LAB

[3] Method (4/4) o Joint confidence by combining multiple measures 10/3/2020 NTNU SPEECH LAB 25

[3] Experiments (1/2) o The performance was evaluated on spontaneous dialogue via the ATR

[3] Experiments (1/2) o The performance was evaluated on spontaneous dialogue via the ATR speechto-speech translation system. 10/3/2020 NTNU SPEECH LAB 26

[3] Experiments (2/2) 10/3/2020 NTNU SPEECH LAB 27

[3] Experiments (2/2) 10/3/2020 NTNU SPEECH LAB 27