CALO Decoder Progress Report for June Arthur Decoder

  • Slides: 17
Download presentation
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder)

CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004

This Presentation o Progress report for June (15 pages) n n n o Review

This Presentation o Progress report for June (15 pages) n n n o Review and Highlight (2 pages) ICSI AM training (4 pages) Infrastructure (2 page) Decoder (8 pages) Summary and Outlook (1 pages) Review of Q 2 2004 n n n Live-mode APIs not completed Sphinx not yet tested for task with vocab> 2 k ICSI training just started

June high-light o They are completed ! n o Live-mode APIs prototype is completed

June high-light o They are completed ! n o Live-mode APIs prototype is completed n o A demo is built. Sphinx 3. 4 went through the WSJ 5 k task successfully n o (to some extent) Without pruning First two phases of ICSI training are completed

ICSI Training -Grand Plan o o o By Ziad and Arthur. C Transcript conversion

ICSI Training -Grand Plan o o o By Ziad and Arthur. C Transcript conversion is completed 4 Phases n Phase I - Replication of Rita’s training n Phase II – Fixing Resource o o n Phase III – Tuning o o n Use corrected train/test/dev sets Fixed transcriptions and dictionary Training: On topology/#senones/#mix Recognition: Parameters tuning Phase IV – Further Improvement o o o Use SCHMM to generate trees? Automatic question generation? Others?

ICSI Training -Current Status o Phase I completed n n Within 0. 5% difference

ICSI Training -Current Status o Phase I completed n n Within 0. 5% difference from Rita’ results Tested on transcriber’s meeting o o 47. 3% WERR. (45. 2% WERR when equivalence pair were considered) Phase II completed n In the development set and testing set o n Results varied from 47% to 29% Clipped speech deletion found to be ineffective.

ICSI Training -Before we go to Phase III o From the last two phases

ICSI Training -Before we go to Phase III o From the last two phases n n o o We have some results that looks good. BUT, Results vary with meeting conditions o # of speakers? o Speaker speaking rate entropy? o Cross talk? Understanding is more important than typing! Plan of next month n n n Understand why recognition results vary Complete Phase III and IV with current test sets. Obtain standard test set from NIST

Infrastructure (2 pages) -Workshops and Presentations o 2 CVS Workshops n n n o

Infrastructure (2 pages) -Workshops and Presentations o 2 CVS Workshops n n n o had great discussion in the workshop Slides can be found at Arthur. C’s web page Will re-do it in the new semester. 2 Speech Developer’s meetings n Next meeting on this Thursday: o “From main() to GMM computation.

Infrastructure -CVS o What’re there in CVS? n MRCP source code (v 1 and

Infrastructure -CVS o What’re there in CVS? n MRCP source code (v 1 and v 2) n Standard training scripts: o o ICSI Conversion Scripts Communicator Training Scripts n o WSJ 5 k Training Scripts n o Guarantee giving you 100% Satisfaction and 12% WERR. Guarantee giving you 100% Satisfaction and 8% WERR. Outlook n Need to migrate to other machines. n Next: ICSI training scripts (P 1 to P 4) n Communicator /WSJ testing scripts.

Decoder work (7 pages) -Interface o o By Yitao (he didn’t even get hurt!)

Decoder work (7 pages) -Interface o o By Yitao (he didn’t even get hurt!) Sphinx 2 -like APIs’ prototype is completed, functions completed n o o A demo is also built. Will be officially included in Sphinx 3. 5. n o Initialization Latest code already available in CVS Plan of July n n Let the APIs go-through its ultimate challenge: be used in an application. Enable logging of the recognizer

Decoder work -Speed o o With big help from Evandro WSJ 5 k task

Decoder work -Speed o o With big help from Evandro WSJ 5 k task evaluation completed n n n o o NVP, perplexity ~= 90 Tested under a 2 G machine All results are not tuned. (very wide beamwidth, no fast GMM computation) S 3 (s 3 flat) : WERR 6. 5%, Speed 2. 7 x. RT S 3. 4 (s 3 fast) : WERR 6. 65%, Speed 0. 94 x. RT Conclusion : WSJ 5 k task is not our challenge. Plan of July -> It is time to try a 20 k task. (ICSI or WSJ 20 k)

Sphinx. Train work o In the current Baum-Welch trainer of Sphinx. Train (v 0.

Sphinx. Train work o In the current Baum-Welch trainer of Sphinx. Train (v 0. 92) n n n Silence is not optionally deleted in Baum. Welch Multiple pronunciations are not allowed in Baum-Welch We rely on force alignment to get the correct alignment

Sphinx. Train 0. 93 progress o Silence Modeling n n o Multiple Pronunciation n

Sphinx. Train 0. 93 progress o Silence Modeling n n o Multiple Pronunciation n n o To be Allowed in Baum-Welch Progress : nearly completed (need 2 -3 days) Correct Triphone Expansion n o Optional silence deletion is now allowed Progress : Completed May not have time to finish it in Q 3. Plan of July n n Enable multiple pronunciations in Baum-Welch Legacy is a problem! (We could fix Sphinx 4 Trainer instead. )

Decoder work -Adaptation o o Mainly code-tracing in this part Situation: n n n

Decoder work -Adaptation o o Mainly code-tracing in this part Situation: n n n o Two versions of MLLR adaptation (Sam Joo’s and Sphinx. Train’s) Some code need to be refined before we expose them S 3 flat has MLLR but not S 3 fast Plan of this month n After finish trainer job, we will tackle it.

Decoder work –Packaging and Distribution o Official Web page: n o cmusphinx. sourceforge. net/

Decoder work –Packaging and Distribution o Official Web page: n o cmusphinx. sourceforge. net/ Release Process n 1, set n = 1 n 2, Loop o o 3, Copy the RC into Sourceforge’s standard distribution web site. Current status: n People yelled in RC II in the calm down period (Yitao fixed them) n Create RCIII this week. n o Distribute the Release Candidate n See anyone yell in one week (calm down period) If yes, n = n + 1, loop again. If no, break

Decoder work -Miscellaneous o Continuous HMM for Communicator model is also completed. n n

Decoder work -Miscellaneous o Continuous HMM for Communicator model is also completed. n n o Ready for combination (Do we want to? ) Possibly we want to combine ICSI model and CMU model. Training script is still a big headache for use n Still have no time to fix it.

Decoder work –Documentation (aka sphinx. Doc) o o Only have progress when n Arthur.

Decoder work –Documentation (aka sphinx. Doc) o o Only have progress when n Arthur. C procrastinates and doesn’t want to read and play video game Draft I of Chapter I and II are completed. n Chapter I : License Agreement and user responsibility n Chapter II : o What is speech recognition for dummy. o History of speech recognition o History of sphinx o Version of sphinx (When to use what)

Summary and Outlook o o o We have done something in June We better

Summary and Outlook o o o We have done something in June We better do more in next 3 months. Priorities – We have to deal with “CALO Grand Challenge” n n n o Recorder/Classifier/Recognizer Integration Improvement of Acoustic/Language Modeling Speaker Adaptation Non-completed tasks always on the list and will pop up in the right time.