Evaluation of Speaker Recognition Algorithms Speaker Recognition Speech

Speaker Recognition • Speech Recognition and Speaker Recognition • speaker recognition performance is dependent

Data Collection and processing • MFCC extraction • Test Algorithms include AHS(Arithmetic Harmonic Sphericity)

cepstrum • Cepstrum is a common transform, used to gain information from a speech

MFCC Extraction • Short-time FFT • Frame Blocking and Windowing Eg: First Frame size=N

• Mel-Frequency Wrapping Mel frequency scale is linear upto 1000 Hz and logarithmic

MFCC • Cepstrum log mel spectrum back to time = MFCCs(Cn) given by where

Arithmetic Harmonic Sphericity • Function of eigen values of a test covariance matrix relative

Gaussian Divergence • Mixture of gaussian densities to model the distribution of the features

Performance – AHS with 138 subjects and 24 MFCCs

Performance – Gaussian Div with 138 subjects and 24 MFCCs

Performance – AHS with 138 subjects and 12 MFCCs

Performance – Gaussian Div with 138 subjects and 12 MFCCs

Review of Probability and Statistics f(x) • Probability Density Functions o Example 2: a=0.

Review of Probability and Statistics • Cumulative Distribution Functions o cumulative distribution function (c.

Review of Probability and Statistics • Expected Values and Variance o expected (mean) value

Review of Probability and Statistics • The Normal (Gaussian) Distribution o the p. d.

Review of Probability and Statistics • The Normal Distribution o any arbitrary p. d.

Review of Markov Model? A Markov Model (Markov Chain) is: • similar to a

Review of Markov Model? Transition Probabilities: • no assumptions (full probabilistic description of system):

Review of Markov Model? Transition Probabilities: • example: 0. 5 S 1 0. 5

Review of Markov Model? Transition Probabilities: • probability distribution function: S 1 S 2

Review of Markov Model? • Example 1: Single Fair Coin 0. 5 S 1

Review of Markov Model? • Example 2: Weather 0. 7 0. 5 0. 25

Review of Markov Model? • Example 2: Weather (con’t) • S 1 = event

Simultaneous speech and speaker recognition using hybrid architecture – Dominique Genoud, Dan Ellis, Nelson

Traditional System • Traditional state of the art speaker recognition system task can be

Model Creation • Once the feature is extracted, a model can be created using

A simultaneous speaker and speech recognition • A system that models the “phone” of

A simultaneous speaker and speech recognition • Maximum a posteriori (MAP) estimation is used

A simultaneous speaker and speech recognition • In the previous equation, was determined to

Speech and Speaker Combination • Posteriori probabilities and Likelihoods Combination for Speech and Speaker

System Description is the word from a set of finite word {W} is the

System Description Probability that a speaker is accepted is LLR(X) is the likelihood ratio.

Combination • Use of MLP adaptation. – shifting the boundaries between the phone classes

Combination • Using posteriori on the test set it can be shown that- •

HMM-Parameter Estimation • Given an observation sequence O, determine the model parameters (A, B,

HMM-Parameter Estimation • = Expected frequency in state i at time t=1

Slides: 46

Download presentation

Evaluation of Speaker Recognition Algorithms

Speaker Recognition • Speech Recognition and Speaker Recognition • speaker recognition performance is dependent on the channel, noise quality. • Two sets of data one to enroll and the other to verify.

Data Collection and processing • MFCC extraction • Test Algorithms include AHS(Arithmetic Harmonic Sphericity) Gaussian Divergence Radial Basis Function Linear Discriminant Analysis etc. ,

cepstrum • Cepstrum is a common transform, used to gain information from a speech signal, whose x-axis is quefrency. • Used to separate transfer function from excitation signal. X(ω)=G(ω)H(ω) log|X(ω) | =log|G(ω) | +log|H(ω) | F− 1 log|X(ω) | =F− 1 log|G(ω) | +F− 1 log|H(ω) |

Cepstrum

MFCC Extraction

MFCC Extraction • Short-time FFT • Frame Blocking and Windowing Eg: First Frame size=N samples Second Frame size begins M(M<N) Overlap of N-M samples and so on… • Window Function: y(n)=x(n)w(n) Eg: Hamming Window: w(n)=0. 54 -0. 46 cos(2πn/N-1), 0<n<N-1

• Mel-Frequency Wrapping Mel frequency scale is linear upto 1000 Hz and logarithmic above 1000 Hz. mel(f)=2595*log(1+f / 700)

Mel-Spaced Filter bank

MFCC • Cepstrum log mel spectrum back to time = MFCCs(Cn) given by where Sk is the mel power spectrum coefficients

Arithmetic Harmonic Sphericity • Function of eigen values of a test covariance matrix relative to a reference covariance matrix for speakers x and y, defined by where D is the dimensionality of the covariance matrix.

Gaussian Divergence • Mixture of gaussian densities to model the distribution of the features of each speaker.

YOHO Dataset Sampling Frequency 8 k. Hz

Performance – AHS with 138 subjects and 24 MFCCs

Performance – Gaussian Div with 138 subjects and 24 MFCCs

Performance – AHS with 138 subjects and 12 MFCCs

Performance – Gaussian Div with 138 subjects and 12 MFCCs

Review of Probability and Statistics f(x) • Probability Density Functions o Example 2: a=0. 25 b=0. 75 x Probability that x is between 0. 25 and 0. 75 is

Review of Probability and Statistics • Cumulative Distribution Functions o cumulative distribution function (c. d. f. ) F(x) for c. r. v. X is: f(x) o example: b=0. 75 C. D. F. of f(x) is x

Review of Probability and Statistics • Expected Values and Variance o expected (mean) value of c. r. v. X with p. d. f. f(x) is: o example 1 (discrete): 0. 25 0. 15 0. 05 0. 10 0. 20 E(X) = 2· 0. 05+3· 0. 10+ … +9· 0. 05 = 5. 35 0. 15 0. 05 1. 0 2. 0 3. 0 4. 0 5. 0 6. 0 7. 0 8. 0 9. 0 o example 2 (continuous):

Review of Probability and Statistics • The Normal (Gaussian) Distribution o the p. d. f. of a normal distribution is where μ is the mean and σ is the standard deviation σ μ

Review of Probability and Statistics • The Normal Distribution o any arbitrary p. d. f. can be constructed by summing N weighted Gaussians (mixtures of Gaussians) w 1 w 2 w 3 w 4 w 5 w 6

Review of Markov Model? A Markov Model (Markov Chain) is: • similar to a finite-state automata, with probabilities of transitioning from one state to another: 0. 1 0. 5 S 2 0. 3 S 3 0. 9 S 4 0. 7 0. 8 S 5 1. 0 0. 2 • transition from state to state at discrete time intervals • can only be in 1 state at any given time

Review of Markov Model? Transition Probabilities: • no assumptions (full probabilistic description of system): P[qt = j | qt-1= i, qt-2= k, … , q 1=m] • usually use first-order Markov Model: P[qt = j | qt-1= i] = aij • first-order assumption: transition probabilities depend only on previous state • aij obeys usual rules: • sum of probabilities leaving a state = 1 (must leave a state)

Review of Markov Model? Transition Probabilities: • example: 0. 5 S 1 0. 5 S 2 0. 3 S 3 1. 0 0. 7 a 11 = 0. 0 a 21 = 0. 0 a 31 = 0. 0 a 12 = 0. 5 a 22 = 0. 7 a 32 = 0. 0 a 13 = 0. 5 a 23 = 0. 3 a 33 = 0. 0 a 1 Exit=0. 0 a 2 Exit=0. 0 a 3 Exit=1. 0 =1. 0

Review of Markov Model? Transition Probabilities: • probability distribution function: S 1 S 2 0. 6 S 3 0. 4 p(remain in state S 2 exactly 1 time) = 0. 4 · 0. 6 = 0. 240 p(remain in state S 2 exactly 2 times) = 0. 4 · 0. 6 = 0. 096 p(remain in state S 2 exactly 3 times) = 0. 4 · 0. 6 = 0. 038 = exponential decay (characteristic of Markov Models)

Review of Markov Model? • Example 1: Single Fair Coin 0. 5 S 1 S 2 0. 5 S 1 corresponds to e 1 = Heads S 2 corresponds to e 2 = Tails a 11 = 0. 5 a 21 = 0. 5 • Generate events: HTHHTHTTTHH corresponds to state sequence S 1 S 2 S 2 S 1 a 12 = 0. 5 a 22 = 0. 5

Review of Markov Model? • Example 2: Weather 0. 7 0. 5 0. 25 S 1 S 2 0. 4 0. 2 0. 7 0. 05 0. 1 S 3 0. 1

Review of Markov Model? • Example 2: Weather (con’t) • S 1 = event 1 = rain S 2 = event 2 = clouds S 3 = event 3 = sun π1 = 0. 5 π2 = 0. 4 π3 = 0. 1 A = {aij} = • what is probability of {rain, clouds, sun, clouds, rain}? Obs. = {r, r, r, c, s, c, r} S = {S 1, S 2, S 3, S 2, S 1} time = {1, 2, 3, 4, 5, 6, 7} (days) = P[S 1] P[S 1|S 1] P[S 2|S 1] P[S 3|S 2] P[S 2|S 3] P[S 1|S 2] = 0. 5 · 0. 7 · = 0. 001715 0. 7 · 0. 25 · 0. 1 · 0. 7 · 0. 4

Review of Markov Model? • Example 2: Weather (con’t) • S 1 = event 1 = rain S 2 = event 2 = clouds S 3 = event 3 = sunny π1 = 0. 5 π2 = 0. 4 π3 = 0. 1 A = {aij} = • what is probability of {sun, rain, clouds, sun}? Obs. = {s, s, s, r, c, s, s} S = {S 3, S 1, S 2, S 3} time = {1, 2, 3, 4, 5, 6, 7} (days) = P[S 3] P[S 3|S 3] P[S 1|S 3] P[S 2|S 1] P[S 3|S 2] P[S 3|S 3] = 0. 1 · = 5. 0 x 10 -7 0. 1 · 0. 25 · 0. 1

Simultaneous speech and speaker recognition using hybrid architecture – Dominique Genoud, Dan Ellis, Nelson Morgan • The automatic recognition process of the human voice is often divided in two part – speech recognition – speaker recognition

Traditional System • Traditional state of the art speaker recognition system task can be divided into two parts– Feature Extraction – Model Creation

Feature Extraction

Model Creation • Once the feature is extracted, a model can be created using various techniques i. e. Gaussian Mixture Model. • Once the model is created we can find distance from one model to another • Based on the distance a decision can be inferred.

A simultaneous speaker and speech recognition • A system that models the “phone” of the speaker and also the speakers features and combines them into a model could perform very well.

A simultaneous speaker and speech recognition • Maximum a posteriori (MAP) estimation is used to generate speaker-specific models from a set of speaker independent (SI) seed models. • Assuming no prior knowledge about the speaker distribution, the a posteriori probability Pr is approximated by the score defined as where the speaker-specific models for all, world model.

A simultaneous speaker and speech recognition • In the previous equation, was determined to be 0. 02 empirically. • Using Viterbi algroithm, N probable speaker P(x| ) can be found. • Results: – Author reported 0. 7% EER compared to 5. 6% EER of GMM based system on the same dataset of 100 person.

Speech and Speaker Combination • Posteriori probabilities and Likelihoods Combination for Speech and Speaker Recognition • Mohamed Faouzi Ben. Zeghiba, Eurospeech 2003. • Authors used a combination of HMM/ANN (MLP) system for this work. • For the features of the speech, he used 12 MFCC coefficients with energy and their first derivatives were calculated every 10 ms over a 30 ms window.

System Description is the word from a set of finite word {W} is the speakers from a set of finite registered speakers {S} is ANN parameters.

System Description Probability that a speaker is accepted is LLR(X) is the likelihood ratio. Is GMM model is the background model where its parameters are derived from using MAP adaptation and the world data set

Combination • Use of MLP adaptation. – shifting the boundaries between the phone classes without strongly affecting the posterior probabilities of the speech sounds of other speakers • Author proposed following formula to combine the both system

Combination • Using posteriori on the test set it can be shown that- • Probability that a speaker is accepted is Determined from a posteriori of the test set.

HMM-Parameter Estimation • Given an observation sequence O, determine the model parameters (A, B, π) that maximize P(O|λ) where λ= (A, B, π) • γt(i) is the probability of being in state i, then

HMM-Parameter Estimation • = Expected frequency in state i at time t=1

• Thank You