Large Vocabulary Continuous Speech Recognition Subword Speech Units

Large Vocabulary Continuous Speech Recognition

Subword Speech Units

HMM-Based Subword Speech Units

Training of Subword Units

Training Procedure

Errors and performance evaluation in PLU recognition n n Substitution error (s) Deletion error (d) Insertion error (i) Performance evaluation: If the total number of PLUs is N, we define: n n Correctness rate: N – s – d /N Accuracy rate: N – s – d – i / N

Language Models for LVCSR Word Pair Model: Specify which word pairs are valid

Statistical Language Modeling

Perplexity of the Language Model Entropy of the Source: First order entropy of the source: If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,

We often compute H based on a finite but sufficiently large Q: H is the degree of difficulty that the recognizer encounters, on average, When it is to determine a word from the same source. Using language model, if the N-gram language model PN(W) is used, An estimate of H is: In general: Perplexity is defined as:

Overall recognition system based on subword units

Naval Resource (Battleship) Management Task: 991 -word vocabulary NG (no grammar): perplexity = 991

Word pair grammar We can partition the vocabulary into four nonoverlapping sets of words: The overall FSN allows recognition of sentences of the form:

WP (word pair) grammar: Perplexity=60 FSN based on Partitioning Scheme: 995 real arcs and 18 null arcs WB (word bigram) Grammar: Perplexity =20

Control of word insertion/word deletion rate In the discussed structure, there is no control on the sentence length n We introduce a word insertion penalty into the Viterbi decoding n For this, a fixed negative quantity is added to the likelihood score at the end of each word arc n

Context-dependent subword units Creation of context-dependent diphones and triphones

If c(. ) is the occurrence count for a given unit, we can use a unit reduction rule such as: CD units using only intraword units for “show all ships”: CD units using both intraword and itnerword units:

Smoothing and interpolation of CD PLU models

Implementation issues using CD units

Word junction effects To handle known phonological changes, a set of phonological rules are Superimposed on both the training and recognition networks. Some typical phonological rules include:

Recognition results using CD units

Position dependent units

Unit splitting and clustering

A key source of difficulty in continuous speech recognition is the So-called function words, which include words like a, and, for, in, is. The function words have the following properties:

Creation of vocabulary-independent units

Semantic Postprocessor For Recognition