FSG Implementation in Sphinx 2 Mosur Ravishankar Jul

  • Slides: 17
Download presentation
FSG Implementation in Sphinx 2 Mosur Ravishankar Jul 15, 2004 15 -Jul-04 FSG Implementation

FSG Implementation in Sphinx 2 Mosur Ravishankar Jul 15, 2004 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 1

Outline · · Input specification FSG related API Application examples Implementation issues 15 -Jul-04

Outline · · Input specification FSG related API Application examples Implementation issues 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 2

FSG Specification · “Assembly language” for specifying FSGs · Low-level · Most standards should

FSG Specification · “Assembly language” for specifying FSGs · Low-level · Most standards should compile down to this level · Set of N states, numbered 0. . N-1 · Transitions: · Emitting or non-emitting (aka null or epsilon) · Each emitting transition emits one word · Fixed probability 0 < p <= 1. · One start state, and one final state · Null transitions can effectively give you as many as needed · Goal: Find the highest likelihood path from the start state to the final state, given some input speech 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 3

An FSG Example FSG_BEGIN city 1 2 1 to NUM_STATES 10 START_STATE 0 FINAL_STATE

An FSG Example FSG_BEGIN city 1 2 1 to NUM_STATES 10 START_STATE 0 FINAL_STATE 9 city 1 from 3 city. N 4 e city. N 9 0 city 1 from city. N 15 -Jul-04 city 1 6 5 to 7 leg e 8 city. N FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) # Transitions T 0 1 0. 5 to T 1 2 0. 1 city 1 … T 1 2 0. 1 city. N T 2 3 1. 0 from T 3 4 0. 1 city 1 … T 3 4 0. 1 city. N T 4 9 1. 0 T 0 5 0. 5 from T 5 6 0. 1 city 1 … T 5 6 0. 1 city. N T 6 7 1. 0 to T 7 8 0. 1 city 1 … T 7 8 0. 1 city. N T 8 9 1. 0 FSG_END 4

A Better Representation · Composition of FSGs 1 [city] 2 from 3 [city] 4

A Better Representation · Composition of FSGs 1 [city] 2 from 3 [city] 4 e to 9 0 from 5 [city] 6 to 7 [city] e 8 boston chicago 0 pittsburgh 1 buffalo seattle 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 5

Multiple Pronunciations and Filler Words · Alternative pronunciations added automatically · Filler word transitions

Multiple Pronunciations and Filler Words · Alternative pronunciations added automatically · Filler word transitions (silence and noise) added automatically · A filler self-transition at every state · Noise words added only if noise penalty (probability) > 0 [filler] 1 [filler] [city] 2 [filler] from 3 [filler] [city] e to 0 [filler] from 5 15 -Jul-04 [filler] [city] 6 [filler] to 7 [city] 9 e 8 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 6

FSG Related API · Loading during initialization (i. e. , fbs_init()): · -fsgfn flag

FSG Related API · Loading during initialization (i. e. , fbs_init()): · -fsgfn flag specifying an FSG file to load (similar to –lmfn flag) · Difference: FSG name is contained in the file · Dynamic loading: · char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file · Switching to an FSG: · uttproc_set_fsg (char *fsgname); · Deleting a previously loaded FSG: · uttproc_del_fsg (char *fsgname); · Old demos could be run with FSGs, simply by recompiling with new libraries 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 7

Mixed LM/FSG Decoding Example · (See lm_fsg_test. c) 15 -Jul-04 FSG Implementation in Sphinx

Mixed LM/FSG Decoding Example · (See lm_fsg_test. c) 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 8

Another Example: Garbage Models · Extraneous speech could be absorbed using an allphone “garbage

Another Example: Garbage Models · Extraneous speech could be absorbed using an allphone “garbage model” [allphone] 1 [city] 2 from 3 [city] 4 e to 9 [allphone] 0 from 5 15 -Jul-04 [allphone] [city] 6 to 7 [city] e 8 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 9

B/W Training and Forced Alignment · Consolidate code for FSGs, Baum-Welch training, and forced

B/W Training and Forced Alignment · Consolidate code for FSGs, Baum-Welch training, and forced alignment? · Sentence HMMs for training and alignment are essentially linear FSGs · Alternative pronunciations and filler words handled automatically · Differences: · B/W uses forward (and backward) algorithm instead of Viterbi · Alignment has to produce phone and state segmentation as well 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 10

Implementation · Straightforward expansion of word-level FSG into a triphone HMM network · Viterbi

Implementation · Straightforward expansion of word-level FSG into a triphone HMM network · Viterbi beam search over this HMM network · No major optimizations attempted (so far) · · No lextree implementation (What? ) Static allocation of all HMMs; not allocated “on demand” (Oh, no!) FSG transitions represented by Nx. N matrix (You can’t be serious!!) Speed/Memory usage profile needs to be evaluated · Mostly new set of data structures, separate from existing ones · Should be easily ported to Sphinx 3 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 11

Implementation: FSG Expansion to HMMs 1 word 1 0 word 2 2 word 1

Implementation: FSG Expansion to HMMs 1 word 1 0 word 2 2 word 1 p 2 p 3 p 4 1 0 word 2 q 1 15 -Jul-04 q 2 q 3 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 2 12

Implementation: Triphone HMMs word 1 0 p 1 p 2 p 3 p 4

Implementation: Triphone HMMs word 1 0 p 1 p 2 p 3 p 4 1 word 1 p 1 0 p 2 p 3 p 1’ p 4’ p 1’’ p 4’’ p 2 Special case for 2 p 1’ -phone words p 2’ p 1’’ 15 -Jul-04 1 Multiple leaf HMMs for different right contexts Multiple root HMMs for different left contexts p 1 p 4 1 -phone words use SIL as right context p 2’’ FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 13

Possible Optimization: Lextrees word 1 p 2 p 3 p 4 word. N q

Possible Optimization: Lextrees word 1 p 2 p 3 p 4 word. N q 1 q 2 q 3 Lextree (associated with source state) 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 14

Possible Optimization: Path Pruning · If there are two transitions with the same label

Possible Optimization: Path Pruning · If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned w w · But reconciling with lextrees is tricky, since labels are now blurred 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 15

Other Issues Pending · · · Dynamic allocation and management of HMMs Implementation of

Other Issues Pending · · · Dynamic allocation and management of HMMs Implementation of absolute pruning Lattice generation N-best list generation … 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 16

Where Is It? · My copy of open source version of Sphinx 2 ·

Where Is It? · My copy of open source version of Sphinx 2 · Someone needs to update the sourceforge copy · Html documentation has been updated 15 -Jul-04 FSG Implementation in Sphinx 2 (rkm@cs. cmu. edu) 17