Alaa Spaih Abeer AbuHantash Directed by Dr Allam
Ala’a Spaih Abeer Abu-Hantash Directed by Dr. Allam Mousa
Outline for Today 1. Speaker Recognition Field 2. System Overview 3. MFCC & VQ 4. Experimental Results 5. Live Demo
Speaker Recognition Field Speaker Recognition Speaker Verification Speaker Identification Text Dependent Independent
System Overview Training mode Feature extraction Testing Speech input Speaker modeling Feature Matching Speaker Model Database Mode Decision Logic Speaker ID
Feature Extraction Ø Feature extraction: is a special form of dimensionality reduction. Ø The aim: is to extract the formants.
Feature Extraction Ø The extracted features must have specific characteristics: § Easily measurable, occur naturally and frequently in speech. § Not change over time. § Vary as much among speakers, consistent for each speaker. § Not affected by: speaker health, background noise. Ø Many algorithms to extract them: LPC, LPCC, HFCC, MFCC. Ø We used Mel Frequency Cepstral Coefficients algorithm: MFCC.
Feature Extraction Using MFCC Input speech Framing and windowing Fast Fourier transform Absolute value Mel scaled-filter bank Log Feature vectors Discrete cosine transform
Framing And Windowing FFT Spectrum Vocal tract Glottal pulse
Mel Scaled-Filter Bank Spectrum Mel spectrum mel(f)= 2595*log 10(1+f/700)
Cepstrum Mel spectrum Ø Ø DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated. MFCC Coeff.
Classification Ø Classification, that is to build a unique model for each speaker in the database. Ø Two major types of models for classification. Stochastic models: GMM, HMM, ANN Ø We used VQ algorithm. Template models: VQ , DTW
VQ Algorithm ØThe VQ technique consists of extracting a small number of representative feature vectors. Ø The first step is to build a speaker-database consisting of N codebooks, one for each speaker in the database. Speaker Feature vectors Clustered into codewords This done by K-means Clustering algorithm Speaker model (codebook)
K-means Clustering start No. of clusters k No centroids No change Distance objects to centroids Grouping based on minimum distance yes End
VQ Example v Given data points, split into 4 codebook vectors with initial values at (2, 2), (4, 6), (6, 5), (8, 8).
VQ Example v. Once there’s no more change, the feature space will be partitioned into 4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.
K-means Clustering v If we set the codebook size to 8 then the output of the clustering will be: VQ MFCC’s of a speaker (1000 x 12) Speaker Codebook (8 x 12)
Feature Matching ØFor each codebook a distortion measure is computed. ØThe speaker with the lowest distortion is chosen. ØDefine the distortion measure Euclidean distance.
System Operates In Two Modes Offline Online M onitor ing Micro phone MFC C Featu re Extra ction Calcu late VQ Disto rtion Make D ecisio n& Displ ay
Applications v Speaker Recognition for Authentication. Ø Banking application. v Forensic Speaker Recognition Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court. v Speaker Recognition for Surveillance. Electronic eavesdropping of telephone and radio conversations.
Results Ø 12 MFCC, 29 Filter banks, 64 Codebook size … ELSDSR database. Ø To show the system identify the speaker according to Euclidean distance calculation. Sp 1 Sp 2 Sp 3 Sp 4 Sp 5 Sp 1 10. 7492 13. 2712 17. 8646 14. 7885 13. 2859 Sp 2 13. 2364 10. 2740 13. 2884 11. 7941 14. 0461 Sp 3 17. 5438 16. 1177 11. 9029 16. 2916 17. 7199 Sp 4 16. 1360 13. 7095 15. 5633 11. 7528 16. 7327 Sp 5 14. 9324 15. 7028 17. 2842 17. 8917 12. 3504
Results v Number of MFCC Vs. ID rate. No. of MFCC ID Rate 5 76 % 12 91 % 20 91 % v Frame Size Vs. ID rate. ØFrame size(10 -30) ms ØAbove 30 ms Bad Good
Results v The effect of the codebook size on the ID rate & VQ distortion.
Results v Number of filter-banks Vs. ID rate & VQ distortion.
Results v The performance of the system on different test shot lengths. Test speech length ID Rate 0. 2 sec 60 % 2 sec 85 % 6 sec 90 % 10 sec 95 %
Summary v Effect of changing some parameters on: § MFCC algorithm. § VQ algorithm. v Our system identify the speaker regardless of the language and the text. v Satisfied results: ü The same training and testing environment. ü Test data needs to be several ten seconds.
- Slides: 26