A 12 WEEK PROJECT IN Speech Coding and
- Slides: 53
A 12 -WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen
Overview n n An Introduction to Speech Signals (Vedrana) Linear Prediction Analysis (Fu) Speech Coding and Synthesis (Fu) Speech Recognition (Vedrana)
Speech Coding and Recognition AN INTRODUCTION TO SPEECH SIGNALS
AN INTRODUCTION TO SPEECH SIGNALS Speech Production n n n Flow of air from lungs Vibrating vocal cords Speech production cavities Lips Sound wave Vowels (a, e, i), fricatives (f, s, z) and plosives (p, t, k)
AN INTRODUCTION TO SPEECH SIGNALS Speech Signals n n Sampling frequency 8 — 16 k. Hz Short-time stationary assumption (frames 20 – 40 ms)
AN INTRODUCTION TO SPEECH SIGNALS Model for Speech Production n n Excitation (periodic, noisy) Vocal tract filter (nasal cavity, oral cavity, pharynx)
AN INTRODUCTION TO SPEECH SIGNALS Voiced and Unvoiced Sounds n n n Voiced sounds, periodic excitation, pitch period Unvoiced sounds, noise-like excitation Short-time measures: power and zero-crossing
AN INTRODUCTION TO SPEECH SIGNALS Frequency Domain n Pitch, harmonics (excitation) Formants, envelope (vocal tract filter) Harmonic product spectrum
AN INTRODUCTION TO SPEECH SIGNALS Speech Spectrograms n n Time varying formant structure Narrowband / wideband
Speech Coding and Recognition LINEAR PREDICTION ANALYSIS
LINEAR PREDICTION ANALYSIS Categories n n Vocal Tract Filter Linear Prediction Analysis n n n Error Minimization Levison-Durbin Recursion Residual sequence u(n)
LINEAR PREDICTION ANALYSIS Vocal Tract Filter(1) Output: speech n n Vocal tract filter If we assume an all poles filter? Input: periodic impulse train
LINEAR PREDICTION ANALYSIS Vocal Tract Filter(2) n Auto regressive model: (all poles filter) n where p is called the model order Speech is a linear combination of past samples and an extra part, Aug(z)
LINEAR PREDICTION ANALYSIS Linear Prediction Analysis(1) n Goal: how to find the coefficients ak in this all poles model? Physical model v. s. Analysis system impulse, Aug(n) error, speech, all poles model ak here is fixed, but unknown! e(n) s(n) ? we try to find αk to estimate ak
LINEAR PREDICTION ANALYSIS Linear Prediction Analysis(2) n n What is really inside the ? box? A predictor (P(z), FIR filter) inside, predicitve original s(n) n P(z) ŝ(n) - predictive error, e(n)=s(n)- ŝ(n) A(z)=1 -P(z) where ŝ(n)= α 1 s(n-1)+α 2 s(n-2)+… + αps(n-p) If αk ≈ ak , then e(n) ≈ Aug(n)
LINEAR PREDICTION ANALYSIS Linear Prediction Analysis (3) n If we can find a predictor generating a smallest error e(n) which is close to Aug(n), then we can use A(z) to estimate filter coefficients. e(n) ≈Aug( n) 1 / A(z) ŝ(n) very similar to vocal tract model
LINEAR PREDICTION ANALYSIS Error Minization(1) n n Problem: How to find the minimum error? Energy of error: , where e(n)=s(n)- ŝ(n) = function(αi) n For quadratic function of αi we can find the smallest value by for each
LINEAR PREDICTION ANALYSIS Error Minization(2) n By differentiation, a set of linear equations n We define that, where This is actually an autocorrelation of s(n)
LINEAR PREDICTION ANALYSIS Error Minization(3) n n n Hence, let’s discuss linear equations in matrix: Linear prediction coefficient is our goal. How to solve it efficiently?
LINEAR PREDICTION ANALYSIS Levinson-Durbin Recursion(1) n In the matrix, LD recursion method is based on following characteristics: n n Symmetric Toeplitz Hence we can solve matrix in O(p 2) instead of O(p 3) Don’t forget our objective, which is to find αk to simulate the vocal tract filter.
LINEAR PREDICTION ANALYSIS Levinson-Durbin Recursion(2) n n In exercise, we solve matrix by ‘brute force’ and L-D recursion. There is no difference of corresponding parameters Error energy v. s. Predictor order
LINEAR PREDICTION ANALYSIS Residual sequence u(n) n n After knowing filter coefficients, we can find s(n residual sequence u(n) ) by inversely filtering computation. Try to compare original s(n) residual u(n) A(z) u(n)
Speech Coding and Recognition SPEECH CODING AND SYNTHESIS
SPEECH CODING AND SYNTHESIS Categories n n n Analysis-by-Synthesis Perceptual Weighting Filter Linear Predictive Coding n n Multi-Pulse Linear Prediction Code-Excited Linear Prediction (CELP) CELP Experiment Quantization
SPEECH CODING AND SYNTHESIS Analysis-by-Synthesis(1) n n n Analyze the speech by estimating a LP synthesis filter Computing a residual sequence as a excitation signal to reconstruct signal Encoder/Decoder : the parameters like LP synthesis filter, gain, and pitch are coded, transmitted, and decoded
SPEECH CODING AND SYNTHESIS Analysis-by-Synthesis(2) n Frame by frame s(n) LP analysis Excitation Generator LP Synthesis Filter ŝ(n) - e(n) Error Minimization n n Without error minimization: With error minimization: LP parameters Excitation parameters E N C O To channel D E R
SPEECH CODING AND SYNTHESIS Perceptual Weighting Filter(1) n n n Perceptual masking effect: Within the formant regions, one is less sensitive to the noise Idea: designing a filter that de-emphasizes the error in the formant region Result: synthetic speech with more error near formant peaks but less error in others
SPEECH CODING AND SYNTHESIS Perceptual Weighting Filter(2) n n In frequency domain: n n LP syn. filter v. s. PW filter Perceptual weighting coefficient: n n n α = 1, no filtering. α decreases, filtering more optimalα depends on perception
SPEECH CODING AND SYNTHESIS Perceptual Weighting Filter(3) n n n In z domain, LP filter v. s. PW filter Numerator: generating the zeros which are the original poles of LP synthesis filter Denominator: placing the poles closer to the origin. α determines the distance
SPEECH CODING AND SYNTHESIS Linear Predictive Coding(1) n n n Based on above methods, PW filter and analysis-by-synthesis If excitation signal ≈ impulse train, during voicing, we can get a reconstructed signal very close to the original More often, however, the residue is far from the impulse train
SPEECH CODING AND SYNTHESIS Linear Predictive Coding(2) n n n Hence, there are many kinds of coding trying to improve this Primarily differ in the type of excitation signal Two kinds: n Multi-Pulse Linear Prediction n Code-Excited Linear Prediction (CELP)
SPEECH CODING AND SYNTHESIS Multi-Pulse Linear Predcition(1) n Concept: represent the residual sequence by putting impulses in order to make ŝ(n) closer to s(n) n LP Analysis Error Minimization Excitation Generator LP Synthesis Filter ŝ(n) - Multi-pulse, u(n) PW Filter
SPEECH CODING AND SYNTHESIS Multi-Pulse Linear Predcition(2) n n s 1 s 2 s 3 s 4 Estimate the LPC filter without excitation Place one impulse (placement and amplitude) A new error is determined Repeat s 2 -s 3 until reaching a desired min error original multi-pulse synthetic error s 1 s 2, 3 s 4
SPEECH CODING AND SYNTHESIS Code-Excited Linear Prediction(1) n The difference: n n Represent the residue v(n) by codewords (exhaustive searching) from a codebook of zeromean Gaussian sequence Consider primary pitch pulses which are predictable over consecutive periods
SPEECH CODING AND SYNTHESIS Code-Excited Linear Prediction(2) s(n) Pitch estimate LP parameters Gaussian Multi-pulse excitation generator codebook v(n) Pitch synthesis filter u(n) LP synthesis filter ŝ(n) LP analysis s(n) PW filter Error minimization
SPEECH CODING AND SYNTHESIS CELP Experiment(1) n An experiment of CELP n n n Original (blue) : Excitation signal (below): Reconstructed (green) :
SPEECH CODING AND SYNTHESIS CELP Experiment(2) n 1. Test the quality for different settings: LPC model order n n 2. Initial M=10 Test M=2 PW coefficient
SPEECH CODING AND SYNTHESIS CELP Experiment(3) 3. Codebook (L, K) n n K: codebook size K influences the computation time strongly. if K: 1024 to 256, then time: 13 to 6 sec n n Initial (40, 1024) Test (40, 16) L: length of the random signal L determines the number of subblock in the frame
SPEECH CODING AND SYNTHESIS Quantization n With quantization, n n n 16000 bps CELP 9600 bps CELP Trade-off Bandwidth efficiency v. s. speech quality
Speech Coding and Recognition SPEECH RECOGNITION
SPEECH RECOGNITION Dimensions of Difficulty n n Speaker dependent / independent Vocabulary size (small, medium, large) Discrete words / continuous utterance Quiet / noisy environment
SPEECH RECOGNITION Feature Extraction n Overlapping frames Feature vector for each frame Mel-cepstrum, difference cepstrum, energy, diff. energy
SPEECH RECOGNITION Vector Quantization n Vector quantization K-means algorithm Observation sequence for the whole word
SPEECH RECOGNITION Hidden Markov Model (1) n n Changing states, emitting symbols (1), A, B 1 2 3 4 5
SPEECH RECOGNITION Hidden Markov Model (2) n Probability of transition n State transition matrix n State probability vector n State equation
SPEECH RECOGNITION Hidden Markov Model (3) n Probability of observing n Observation probability matrix n Observation probability vector n Observation equation
SPEECH RECOGNITION Hidden Markov Model (4) n Discrete observation hidden Markov model n Two HMM problems n n Training problem Recognition problem
SPEECH RECOGNITION Recognition using HMM (1) n Determining the probability that a given HMM produced the observation sequence states 3 3 3 2 2 2 1 1 1 time n Using straightforward computation – all possible paths, S T
SPEECH RECOGNITION Recognition using HMM (2) n Forward-backward algorithm, only the forward part Forward partial observation n Forward probability n i
SPEECH RECOGNITION Recognition using HMM (3) n Initialization n Recursion n Termination i j
SPEECH RECOGNITION Training HMM n No known analytical way Forward-backward (Baum-Welch) reestimation, a hill-climbing algorithm Reestimates HMM parameters in such a way that n Method: n n n Uses and to compute forward and backward probabilities, calculates state transition probabilities and observation probabilities Reestimates the model to improve probability Need for scaling
SPEECH RECOGNITION Experiments n Matrices A and B n Observation sequences for words ‘one’ and ‘two’
Thank you!
- Selective coding adalah
- Contoh selective coding
- Coding dna and non coding dna
- History of qualitative research
- Week by week plans for documenting children's development
- Template
- Reported speech matching exercises
- Pure speech definition
- Ann says i'm tired
- Simple past direct
- Indirect speech of interrogative sentences
- Reported speech exercises doc
- Quoted speech to reported speech
- Poem speech to the young
- Informative vs persuasive writing
- Changing from direct to indirect speech
- Change pronouns
- Speech to the young gwendolyn brooks
- Direct indirect structure
- Present simple in reported speech
- Reported speech
- Persuasive vs informative speech
- Quoted speech vs reported speech
- Direct speech into reported speech
- They are playing football
- Ronald: "where does maria park her car?"
- Tabla de direct speech-reported speech
- Coding and logic exam questions
- Optometry coding and billing
- Diagnostic and procedural coding
- Unit-6
- Harmonized commodity description and coding system
- Advantages and disadvantages of coding
- Hsi color wheel
- Billing and coding
- Secure coding principles and practices
- Information theory and neural coding
- Corrupted size vs. prev_size:
- Reported speech project
- Speech generation project
- Project kick off meeting speech sample
- We need to enclose a field with a fence. we have 500 feet
- Pros and cons of four day school week
- Days of the week and months of the year
- School subjects and days of the week
- Romeo and juliet timeline
- The role of project management in achieving project success
- What is a project background
- Modern project profiles in software project management
- Reducing project duration examples
- Introduction to project management kathy schwalbe
- Project evaluation in software project management
- Introduction for project
- Types of project audit in project management