Brief Overview of Different Versions of Sphinx Arthur

Brief Overview of Different Versions of Sphinx Arthur Chan

Introduction l l l Software aspect of the recognizer is very important Research always require correct use of the software. Sphinx II + IV + Sphinx. Train l l ~= 100 k lines of code Each of them are fairly complex

This presentation (30 pages) l l l Introduction (3 pages) History of Sphinx (13 pages) l Sphinx I (2 pages) l Sphinx III (3 pages) l Sphinx. Train (3 pages) l Sphinx IV (3 pages) How do I get the source code? (4 pages) l Versioning l Three rules of not getting lost in different recognizers Where can I get “official” information? (2 pages) Outlook in each recognizer. (3 pages) Conclusion

Brief history of Sphinx l Largely adapted from l Rita’s “The Sphinx Speech Recognition Systems” l l www. cs. cmu. edu/~rsingh/ Kevin et al’s “Speech Recognition: Past, Present and Future” l www. cs. cmu. edu/~msiegler/ASR/futureofcmufinal. html

Before Sphinx l Dragon l l One of the first use of HMM in speech recognition One of the first use of “purely statistically model” in speech Express the knowledge using HMM network Harpy l l One of the first use of beam search Use phoneme to represent words.

Sphinx I l Before Sphinx …. . . l l From AT&T’s literature, the concept of speakerindependence was proposed in 1979 In 1979 -1987, most systems are either, l l l Speaker dependent Speaker independent but in a very small domain (<100 words) Sphinx I is therefore outstanding l Accuracy is 90% on Resource Management

Sphinx I (1987) l l l l By Kai-Fu Lee and Roberto Bisiani Key developer included Hsiao-wuen Hon, Fil Alleva Written in C. Continuous speech recognizer using discrete HMM with 3 codebooks of size 256. Using simple word-pair grammar Generalize triphones Real-time on Sun 3 or Dec 3000 Where is the source code? Good antique!

Sphinx II (1992) l l l By Xuedong Huang Hardwired to 5 -state Bakis topology 3 -gram language models Decision-tree tying of HMM (by Mei-Yuh Huang) 90% in WSJ task (0 or 1? )

Fast Beam Search v. X l l l FBS-6 flat lexicon decoder FBS-7 lexicon tree-based. FBS-8 decoder (written by Ravi Mosur, see thesis in 96) Support multiple types of beam pruning. Lexical tree Tricks in GMM Computation l l Machine optimization: loop unrolling Predictive Codebook computation Phoneme lookahead Best path search.

Other facts about Sphinx II l l We license it at the beginning (seem to back till days like 95) In 2000, it starts to be open-sourced in Sourceforge under Berkeley’s style license l l You could incorporate Sphinx’s source code You don’t need to open your source code. (No recursive legal binding) Similar to LGPL In 2001, a major alpha release by Kevin that ensures portability in several platforms.

Sphinx III flat lexicon decoder (“s 3”, “s 3 flat”, ”s 3 slow”) l Sphinx III (by Ravi Mosur) l l l Flat Lexicon Support both CHMM and SCHMM “Poor-man” trigram l Use only the most likely first word, this avoid D^2 expansion of the word lattice. Arbitrary topology Very accurate, used in evaluation of BN and others. Derivative from the search include l l l N-best generator Aligner Phone recognizer

Sphinx III tree lexicon decoder (“s 3. x”, ”s 3 fast”, ”s 3 inaccurate”) l What is s 3. x actually? l l l A “spin-off” of the Sphinx III flat lexicon’s source code First use was in BN 10 x RT evaluation in 1999 From s 3. 0 -> s 3. 2 l l l Use tree-lexicon with unigram lookahead Lexical tree with approximation to avoid memory problem One of the first in the world used Sub-vector quantization in speed-up GMM computation

(cont. ) l From s 3. 2 -> s 3. 3 (Rita, Ricky) l l From s 3. 3 -> s 3. 4 (Evandro, Arthur C, Jahanzeb, ) l l l Live mode recognizer (livedecode) and simulator (livepretend) 4 -level of speed-up of GMM computation, phoneme lookahead Bug fixes in live mode From s 3. 4 -> s 3. 5 (Evandro, Arthur C, Yitao) l (Tentative) Speaker adaptation + documentation

Facts about S 3 l l A Java version exists -> sphin 3 j Open source at ~2002 Always being maintained by Evandro from 2001 to now. s 3. 5 is the current active branch in S 3 development.

Sphinx. Train l l l Equally important and very complex But not well understood. What is Sphinx. Train? l l l A collection of ~40 tools for Sphinx 2, 3 and 4 acoustic model training A set of perl scripts to do training Sphinx 2 and 3 all have slight different formats of models

Mini-history l Baum Welch trainer and Viterbi trainer existed very long time ago. l l From the chaos, Eric Thayer first pull everything together to create the package Sphinx. Train Rita did numerous bug fixes and modification of the current trainer l Innovate the use of automatic question generation. (make_quest) l Built a set of training scripts for RM (the 0*/ scripts) l Write the first set of systematic tutorial on training Ricky refined the code and wrote the first set of perl script for Training. l l Training tool in general was not systematic and was no structured. He made a PHD out of it too. (PHD = Push Here Dummy!) Alan and Kevin l Put the set of code to sourceforge l Alan build a set of training script that can “run-through”

Sphinx IV l l Why Sphinx IV? Too many limitations in Sphinx. Train and Sphinx III l l Only N-gram Approximation of triphones Fast GMM computation could be very troublesome to understood Bw doesn’t skip silence. We heavily rely on force alignement in training.

Sphinx IV (cont. ) l l l (By no mean complete……) Lead Design : Bhiksha (MERL) Lead Team Developer : Willer Walker (Sun) Key developers : Evandro, Rita, Phillip Kwok and Paul Lamere Many heavy weight speech advisors: Evandro, Rita, Ravi, Bhiksha, Medro Moreno ……

Is Sphinx IV good? l l l Very accurate, very fast, very versatile and very nicely-pakcaged Java-based speech recognizer Some internal benchmark in RM and WSJ 5 k is shown to be faster and more accurate than s 3. 3 (under 1 x. RT and 10% better) Support N-gram, FSM and FSG. Will provide facilities like confidence-scoring Still under development (just have first alpha release) Trainer is not stable

Summary of the recognizers and trainers l l l Sphinx I -> obsolete Sphinx II -> we are using the fast recognizer now Sphinx III, the following coexists l l S 3 flat S 3 fast (s 3. 4 stable, s 3. 5 devel) Sphinx. Train (0. 92 in the CVS) Sphinx IV l l Recognizer is alpha released Trainer not yet stable

How can I get version X of Sphinx? l Official Web page of Sphinx l l http: //cmusphinx. sourceforge. net Give announcement and news of development Some documentation is there. For the tarballs l l http: //sourceforge. net/projects/cmusphinx Releases: l sphinx 2 -0. 4. tgz (s 2) l sphinx 3 -0. 1. tgz (s 3. 3) l sphinx 3 -0. 4 -rc 2. tgz (s 3. 4 release candidate II) l sphinx 4 -0. 1 alpha-src. zip (s 4)

Rule 2: If it doesn’t exist in CVS, officially it doesn’t exist l Simply speaking, no one actually support and maintain them. Software fall into this category: l CMU LM Toolkit (we haven’t touched it for a while) l l l We may do it in the future. Phoenix (Distributed somewhere else) Training scripts in csh l Rita always actively support it.

Rule 1: If they were no tarballs, they are in CVS l l ANYONE can get the following modules through CVS by using the following commands: l cvs –z 3 – d: pserver: anonymous@cvs. sourceforge. net; /cvsroot/cmusphinx co modulname modulename = l Sphinx. Train -> Sphinx. Train l archive_s 3 -> s 3 + s 3. 0 + s 3. 2 + s 3. 3 l sphinx 2 -> devel ver. of sphinx 2 l sphinx 3 =~ s 3. 4 -> we will check base on this to develop s 3. 5 l share =~ cepview + lm 3 g 2 dmp l sphinx 3 j = the java version of sphinx 3 l Sphinx 4 = development version of sphinx 4

Rule 3: You may need other modules to complete your task l l Sphinx. Train heavily rely on force alignment so you also need s 3 -align Usage of any s 3 recognizers required the LM in DMP format so you need the tool lm 3 g 2 dmp which can be found in sphinx 2 or share.

Where can I get more information for the recognizer? l People to ask l l l s 2 : Evandro , Ravi S 3 flat : Evandro, Ravi , Arthur. C S 3 tree: Evandro, Ravi, Arthur. C Sphinx. Train: Rita, Evandro, Ravi, Arthur. C, Rong, Ziad, Murali. S 4 : S 4’s developers in Sourceforge l Willie, Paul, Phillip, Bhiksha, Rita, Evandro.

Web page to look up l Rita’s web page l l l Twiki web page for sphinx 4 design l l www. cs. cmu. edu/~rsingh Contains the manual of training www. speech. cs. cmu. edu/cgibin/cmusphinx/twiki/view/Sphinx 4/Web. Home/ Arthur. C’s web page l l Risk his life to write a manual for Sphinx 3. 4 Also collect some information for each Sphinx

Outlook of all recognizers l Sphinx II l l l Sphinx III l l l Sorry, we won’t support it too much. Reason, s 3. 4 and s 4 are proved to have very nice speed and accuracy performance Only active branch is s 3. 5 Moderate change in s 3 flat Motivated by project CALO This quarter : make adaptation works. Sphinx. Train l l Write a set of scripts for Continuous HMM training Silence deletion problem will be fixed.

(cont. ) l sphinx. Doc l l Sphinx IV l l l Chapter 1 and 2 completed (*sigh*, still 7 left) Only begin written when Arthur C is procrastinating and don’t want to read and play video game. Will be there at around Sep or Oct. Alpha release Trainer will be fixed Argus l l Incorporate the advantages of many speech recognizers together Not yet started.

Conclusion l This presentation l l l Summarize the current code status of Sphinx and Sphinx. Train. We still have a lot of work to do…… Next presentation l s 3 or s 3. 4 from main to the search.