Use of durations to enhance vowels discrimination AbdelRahman

Use of durations to enhance vowels discrimination Abdel-Rahman Samir 10 December 2007 Katholieke Universiteit Leuven - ESAT, BELGIUM

10 December 2007 Overview: - Summary of work presented before. - Combining MFCC and Duration. - Phone lattice rescoring experiments. - Modifying the 2 nd pass FST. - Future work. Use of durations to enhance vowels discrimination 2

10 December 2007 Summary of work presented before: – By analyzing the recognizer performance, switching between short and long vowels (type 5 error) was shown to cause 16. 3% of the false acceptance rate (undetected errors). – Context Dependent (CD) duration models were built to discriminate between short and long vowel counterparts (e. g. between 'A' and 'a'). – Duration as a feature (for vowel classification) was tested versus the MFCC, as shown in table: Use of durations to enhance vowels discrimination 3

10 December 2007 Combining MFCC and Duration: – Feature fusion is expected to improve over individual classifiers. – So, let's consider the MFCC score (log likelihood difference between competing vowels) and duration score as a combined feature vector in a 2 -D plane. Use of durations to enhance vowels discrimination 4

10 December 2007 Combining MFCC and Duration (cntd): – Assuming a separation boundary between the two classes. So, Ya = b Where, Y is an nx 3 matrix containing data points, a is a 3 x 1 vector containing classifier coefficients, and b is an nx 1 constant bias vector. – By computing the pseudo-inverse of Y, as Y isn't square, using Minimum Squared Error (MSE) technique the coefficients vector can be computed. – With a proper choice of the b vector, MSE discriminant function will relate directly to Fisher's linear discriminant. Elements of class i in b will be weighted by (n/ni) where ni is the number of training points belonging to class i. Use of durations to enhance vowels discrimination 5

10 December 2007 Combining MFCC and Duration (cntd): Use of durations to enhance vowels discrimination 6

10 December 2007 Phone lattice rescoring using the combined system: – Phone lattices of average size of 10. 9 (number of competing phones at the same frame). – Tri-phone expansion was made for phone lattices to have context dependent arcs. – To remove arcs with bad scores, and to get a graph size that is comparable with the baseline one, forward backward pruning, based on modified acoustic scores and bi-gram phone language model scores, is applied. Use of durations to enhance vowels discrimination 7

10 December 2007 Lattice Rescoring (cntd): – Duration score (DS) and acoustic log likelihood difference between the real and hypothetical arcs (Adiff) are calculated (acoustic score of the hypothetical tri-phone is calculated by aligning arc frames to its states). – The rescoring equation is: Anew = A + w at p where, A is the original arc acoustic score, Anew is the modified arc score, w is a scalar weight for (the same for all vowels), a is the classification line coefficients , and p = [1 Adiff DS]. Use of durations to enhance vowels discrimination 8

10 December 2007 Lattice rescoring (cntd) Use of durations to enhance vowels discrimination 9

10 December 2007 Modifying the 2 nd pass FST: - Motivation: - Overcome garbage cost. - Better description of children errors. - Parallel alternative arcs have been added for each vowel arc. - Different arc penalty for every vowel pair. i O p o (Alternative) n w I (Alternative) Garbage skip Use of durations to enhance vowels discrimination 10

10 December 2007 Modifying the 2 nd pass FST (cntd): - Testing done over our development set (2 syllables tasks, one school). - Num of words = 4560, Annotators marked 17. 7 % of them as errors. Orig. FST Mod. FST 2 LG FA: 13. 7 % Detection: 64. 5 % FA: 16. 5 % Detection: 66. 3 % 2 LGPseudo FA: 26. 7 % Detection: 75. 5 % FA: 30. 1 % Detection: 82. 2 % ALL FA: 19. 0 % Detection: 73. 3 % FA: 22. 0 % Detection: 78. 9 % Use of durations to enhance vowels discrimination 11

10 December 2007 Future Work - Investigate on FA and Detection rates of type 5 error with different penalties for different vowels. - Investigate on adaptation of the acoustic models towards CHOERC data. - Check the performance of Pseudo tasks using CI acoustic models and bi-gram phone language models. Use of durations to enhance vowels discrimination 12

10 December 2007 Thank You Use of durations to enhance vowels discrimination 13