Implementing a Speech Recognition Algorithm with VSIPL Don

  • Slides: 30
Download presentation
Implementing a Speech Recognition Algorithm with VSIPL++ Don Mc. Coy, Brooks Moses, Stefan Seefeld,

Implementing a Speech Recognition Algorithm with VSIPL++ Don Mc. Coy, Brooks Moses, Stefan Seefeld, Justin Voo Software Engineers Embedded Systems Division / HPC Group September 2011

Objective VSIPL++ Standard: n Standard originally developed under Air Force and MIT/LL leadership. n

Objective VSIPL++ Standard: n Standard originally developed under Air Force and MIT/LL leadership. n Code. Sourcery (now Mentor Graphics) has a commercial implementation: Sourcery VSIPL++. n Intention is to be a general library for signal and image applications. n Originating community largely focused on radar/sonar. Question: Is VSIPL++ indeed useful outside of radar and radar-like fields, as intended? n Sample application: Automatic Speech Recognition © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Automatic Speech Recognition This presentation focuses on two aspects: n Feature Extraction n “Decoding”

Automatic Speech Recognition This presentation focuses on two aspects: n Feature Extraction n “Decoding” (a. k. a. Recognition) We will compare to existing Matlab and C code: n PMTK 3 (Matlab modeling toolkit) — written by Matt Dunham, Kevin Murphy and others n MFCC code (Matlab implementation) — written by Dan Ellis n HTK (C-based research implementation) — from Cambridge University Engineering Department (CUED) © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Implementing a Speech Recognition Algorithm with VSIPL++ “DECODING” © 2011 Mentor Graphics Corp. www.

Implementing a Speech Recognition Algorithm with VSIPL++ “DECODING” © 2011 Mentor Graphics Corp. www. mentor. com

Decoding n Hidden Markov Models (HMM) a 22 a 12 s a 33 a

Decoding n Hidden Markov Models (HMM) a 22 a 12 s a 33 a 23 ih "six" a 44 a 34 a 24 k a 55 a 34 s a 35 Observation sequence o 1 o 2 o 3 o 4 o 5 o 6 © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com o 7

Decoding n Gaussian Mixture Models — Trained on acoustic data to account for variation

Decoding n Gaussian Mixture Models — Trained on acoustic data to account for variation — Diagram shows representation of PDF for a two-variable MFCC © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Decoding n We wish to find the highest probability word yn from a sequence

Decoding n We wish to find the highest probability word yn from a sequence of feature vectors O = { o 1, o 2, … on }, so we maximize (over all words) the a posteriori probability P ( yn | O ) n Using Bayes rule, we find it is sufficient to calculate the log-likelihood log p (O | yn) + log P (yn ) Likelihood of a word with a given set of model parameters © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Decoding Implementation n Matlab and VSIPL++ code – PMTK 3 hmm. Filter() function Matrix<T>

Decoding Implementation n Matlab and VSIPL++ code – PMTK 3 hmm. Filter() function Matrix<T> AT = transmat. transpose(); alpha = T(); normalize(init. Dist * softev. col(0), alpha. col(0), scale(0)); for (length_type t = 1; t < tmax; ++t) normalize(prod(AT, alpha. col(t-1)) * softev. col(t), alpha. col(t), scale(t)); T eps = std: : numeric_limits<T>: : epsilon(); loglik = sumval(log(scale + eps)); scale = zeros(T, 1); AT = transmat'; alpha = zeros(K, T); [alpha(: , 1), scale(1)] = normalize(init. Dist(: ). * softev(: , 1)); for t=2: T [alpha(: , t), scale(t)] = normalize((AT * alpha(: , t-1)). * softev(: , t)); end loglik = sum(log(scale+eps)); © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Decoding Implementation n Matlab and VSIPL++ code – PMTK 3 hmm. Filter() function Matrix<T>

Decoding Implementation n Matlab and VSIPL++ code – PMTK 3 hmm. Filter() function Matrix<T> AT = transmat. transpose(); alpha = T(); normalize(init. Dist * softev. col(0), alpha. col(0), scale(0)); for (length_type t = 1; t < tmax; ++t) normalize(prod(AT, alpha. col(t-1)) * softev. col(t), alpha. col(t), scale(t)); T eps = std: : numeric_limits<T>: : epsilon(); loglik = sumval(log(scale + eps)); scale = zeros(T, 1); AT = transmat'; alpha = zeros(K, T); [alpha(: , 1), scale(1)] = normalize(init. Dist(: ). * softev(: , 1)); for t=2: T [alpha(: , t), scale(t)] = normalize((AT * alpha(: , t-1)). * softev(: , t)); end loglik = sum(log(scale+eps)); © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Decoding Implementation n Similar line counts for both Matlab and VSIPL++ — VSIPL++: 30

Decoding Implementation n Similar line counts for both Matlab and VSIPL++ — VSIPL++: 30 — Matlab: 21 30 25 20 15 10 5 0 SV++ Matlab * generated using David A. Wheeler's 'SLOCCount'. © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Decoding Implementation n Sourcery VSIPL++ Performance 1, 2 1 Millions of Log-Likelihoods (Models) 5

Decoding Implementation n Sourcery VSIPL++ Performance 1, 2 1 Millions of Log-Likelihoods (Models) 5 -State HMM 0, 8 Intel CUDA 0, 6 Cell 0, 4 0, 2 0 3 4 5 6 7 8 9 10 11 12 13 14 Number of Frames (2^N) © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Decoding Implementation Additional Parallelization Strategies n Custom kernels (GPU, Cell) — Include user-written low-level

Decoding Implementation Additional Parallelization Strategies n Custom kernels (GPU, Cell) — Include user-written low-level kernels for key operations — Pack more operations into each invocation — Take advantage of overlapped computations and data transfers n Maps — Distribute VSIPL++ computations across multiple processes — Explicit management of multicore/multiprocessor assignment © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Implementing a Speech Recognition Algorithm with VSIPL++ FEATURE EXTRACTION © 2011 Mentor Graphics Corp.

Implementing a Speech Recognition Algorithm with VSIPL++ FEATURE EXTRACTION © 2011 Mentor Graphics Corp. www. mentor. com

Feature Extraction n Speech signals – the word “four” and its spectrogram . FFT

Feature Extraction n Speech signals – the word “four” and its spectrogram . FFT FFT . . © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction n Mel-frequency Cepstral Coefficients — 13 coefficients, one of which is overall

Feature Extraction n Mel-frequency Cepstral Coefficients — 13 coefficients, one of which is overall power across all bands — 13 delta coefficients — 13 delta-delta, or ‘acceleration’ coefficients n Result: 39 -element “feature” vector for each timeslice — A 16 k. Hz signal chopped into 1024 samples with 50% overlap yields a feature vector every 32 ms. — Each second of speech gives about 31 feature vectors. n Steps: — — Pre-emphasis FFT Filter bank (spectral warping) DCT © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction n Pre-emphasis — High-pass FIR filter: H(z) = 1 – 09. 5

Feature Extraction n Pre-emphasis — High-pass FIR filter: H(z) = 1 – 09. 5 z-1 — Increases recognition accuracy by leveling the energy present in across frequency bands n Framing / Windowing / FFT — Choose offset between frames (30 -50%) — Choose frame length (300 – 1024 samples) n Mel-scale spectral warping — Provides compensation for how the auditory system perceives relative differences in pitch n Discrete Cosine Transform — Final product are the cepstral coefficients © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n VSIPL++ code // Apply a pre-emphasis filter to the input

Feature Extraction Implementation n VSIPL++ code // Apply a pre-emphasis filter to the input Vector<scalar_f> h(2); h(0) = scalar_f(1); h(1) = scalar_f(-0. 97); Fir<scalar_f> preemp(h, signal_length); preemp(x, y); // Compute the spectrogram of the filtered input Matrix<std: : complex<T> > S = specgram(y, hanning(frame_length), frame_offset); // Integrate into mel bins, in the real domain Matrix<T> A = prod(wts, magsq(S)); // Convert to cepstra via DCT Matrix<T> cepstra = spec 2 cep(A, numceps); © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n VSIPL++ code // Apply a pre-emphasis filter to the input

Feature Extraction Implementation n VSIPL++ code // Apply a pre-emphasis filter to the input Vector<scalar_f> h(2); h(0) = scalar_f(1); h(1) = scalar_f(-0. 97); Fir<scalar_f> preemp(h, signal_length); preemp(x, y); // Compute the spectrogram of the filtered input Matrix<std: : complex<T> > S = specgram(y, hanning(frame_length), frame_offset); // Integrate into mel bins, in the real domain Matrix<T> A = prod(wts, magsq(S)); // Convert to cepstra via DCT Matrix<T> cepstra = spec 2 cep(A, numceps); © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n A closer look at the spectrogram function Fft<const_Vector, T, std:

Feature Extraction Implementation n A closer look at the spectrogram function Fft<const_Vector, T, std: : complex<T>, 0, by_reference, 1> fft(Domain<1>(N), 1. 0); for (length_type m = 0; m < M; ++m) fft(in(Domain<1>(m * I, 1, N)) * window, S. col(m)); or Fftm<T, complex<T>, col, fft_fwd, by_reference, 1> fftm(Domain<2>(N, M), 1. 0); Matrix<T, Dense<2, T, col 2_type> > tmp(N, M); for (length_type m = 0; m < M; ++m) tmp. col(m) = in(Domain<1>(m * I, 1, N)) * window; fftm(tmp, S); © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n A closer look at the spec 2 cep function (aka

Feature Extraction Implementation n A closer look at the spec 2 cep function (aka the DCT) template <typename T, typename Block> vsip: : Matrix<T> spec 2 cep( vsip: : const_Matrix<T, Block> spec, vsip: : length_type const ncep) { using namespace vsip; Length_type nrow = spec. size(0); Matrix<T> dctm(ncep, nrow, T()); for (length_type i = 0; i < ncep; ++i) dctm. row(i) = cos(i * ramp<T>(1, 2, nrow) / (2 * nrow) * M_PI) * sqrt(T(2) / nrow); return (prod(dctm, log(spec))); } © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n A closer look at the spec 2 cep function (aka

Feature Extraction Implementation n A closer look at the spec 2 cep function (aka the DCT) template <typename T> class Dct { public: Dct(vsip: : length_type const ncep, vsip: : length_type const nfilts) : ncep_(ncep), nfilts_(nfilts), dctm_(ncep, nfilts) { for (vsip: : length_type i = 0; i < ncep; ++i) dctm_. row(i) = cos(i * vsip: : ramp<T>(1, 2, nfilts) / (2 * nfilts) * M_PI) * sqrt(T(2) / nfilts); } template <typename Block> vsip: : Matrix<T> operator()(vsip: : const_Matrix<T, Block> input) { return (vsip: : prod(dctm_, input)); }. . . }; © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n VSIPL++ using the modified DCT implementation Dct<T> dct(numceps, numfilters); .

Feature Extraction Implementation n VSIPL++ using the modified DCT implementation Dct<T> dct(numceps, numfilters); . . . // Convert to cepstra via DCT Matrix<T> cepstra = dct(log(A)); © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n Line count comparison* — C++: 63 — Matlab: 63 —

Feature Extraction Implementation n Line count comparison* — C++: 63 — Matlab: 63 — VSIPL++: 50 70 60 50 n Performance vs. optimized C (time to process 1000 MFCCs) — C : 1. 1 S — Sourcery VSIPL++: 0. 8 s 40 30 20 10 0 SV++ C * generated using David A. Wheeler's 'SLOCCount'. © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com Matlab

Feature Extraction Implementation n Sourcery VSIPL++ performance (FFT version) Millions of Samples/second processed 70

Feature Extraction Implementation n Sourcery VSIPL++ performance (FFT version) Millions of Samples/second processed 70 60 50 Intel 40 CUDA 30 Cell 20 10 0 9 11 13 15 17 19 21 Signal length (2^N) © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Feature Extraction Implementation n Sourcery VSIPL++ performance (multiple-FFT version) Millions of Samples/second processed 70

Feature Extraction Implementation n Sourcery VSIPL++ performance (multiple-FFT version) Millions of Samples/second processed 70 60 50 Intel 40 CUDA 30 Cell 20 10 0 9 11 13 15 17 19 21 Signal length (2^N) © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

Implementing a Speech Recognition Algorithm with VSIPL++ CONCLUSIONS © 2011 Mentor Graphics Corp. www.

Implementing a Speech Recognition Algorithm with VSIPL++ CONCLUSIONS © 2011 Mentor Graphics Corp. www. mentor. com

VSIPL++ Assessment & Conclusions Benefits of the VSIPL++ Standard n Does not tie the

VSIPL++ Assessment & Conclusions Benefits of the VSIPL++ Standard n Does not tie the user’s hands with regard to algorithmic choices — Prototyping algorithms is fast and efficient for users n C++ code is easier to read and more compact — Fosters rapid development n Implementers of the standard have the flexibility required to get good performance. — Allows best performance on a range of hardware n Prototype code is benchmark-ready… © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

VSIPL++ Assessment & Conclusions, cont… Potential Extensions to the VSIPL++ Standard n Direct Data

VSIPL++ Assessment & Conclusions, cont… Potential Extensions to the VSIPL++ Standard n Direct Data Access (DDA) — Already proposed to standards body — Proven useful in the field n Sliding-window FFT / FFTM n DCT and other transforms © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com

© 2011 Mentor Graphics Corp. www. mentor. com

© 2011 Mentor Graphics Corp. www. mentor. com

References n PMTK 3 — probabilistic modeling toolkit for Matlab/Octave, version 3 — by

References n PMTK 3 — probabilistic modeling toolkit for Matlab/Octave, version 3 — by Matt Dunham, Kevin Murphy, et. al. — http: //code. google. com/p/pmtk 3/ n PLP and RASTA (and MFCC, and inversion) in Matlab — by Daniel P. W. Ellis, 2005 — http: //www. ee. columbia. edu/~dpwe/resources/matlab/rastamat/ n HTK — Hidden Markov Model Toolkit (HTK) — by the Machine Intelligence Laboratory at Cambridge University — http: //htk. eng. cam. ac. uk/ n Sourcery VSIPL++ — Optimized implementation of the VSIPL++ standard — http: //www. mentor. com/embedded-software/codesourcery © 2011 Mentor Graphics Corp. D. M. , Implementing a Speech Recognition Algorithm with VSIPL++, September 2011 www. mentor. com