Specom 2004 Sep 20 2004 Automatic Speaker Recognition

  • Slides: 22
Download presentation
Specom’ 2004, Sep 20, 2004 Automatic Speaker Recognition for Series 60 Mobile Devices Juhani

Specom’ 2004, Sep 20, 2004 Automatic Speaker Recognition for Series 60 Mobile Devices Juhani Saastamoinen, Evgeny Karpov, Ville Hautamäki, and Pasi Fränti University of Joensuu, Department of Computer Science University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Background • Project in National FENIX programme – New Methods and Applications in Speech

Background • Project in National FENIX programme – New Methods and Applications in Speech Technology • 7 research institutes • Project partners: NRC, Lingsoft, National Bureau of Investigation, etc. • Joensuu: Speaker Recognition • http: //cs. joensuu. fi/pages/pums University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Research Group PUMS project Pasi Fränti Professor Ismo Kärkkäinen Clustering algorithms Juhani Saastamoinen Project

Research Group PUMS project Pasi Fränti Professor Ismo Kärkkäinen Clustering algorithms Juhani Saastamoinen Project manager Evgeny Karpov Project researcher University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi Tomi Kinnunen Researcher Ville Hautamäki Project researcher

Application Scenarios Speaker Recognition Speaker Verification Is this Bob’s voice? (Claim) + Speaker Identification

Application Scenarios Speaker Recognition Speaker Verification Is this Bob’s voice? (Claim) + Speaker Identification Whose voice is this? ? Identification Verification Imposter! University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Project Goal Port speaker recognition to Series 60 mobile phone University of Joensuu Dept.

Project Goal Port speaker recognition to Series 60 mobile phone University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Symbian Phones • Series 60 phone features: – – – 16 MB ROM 8

Symbian Phones • Series 60 phone features: – – – 16 MB ROM 8 MB RAM 176 x 208 display ARM-processor No floating-point unit!!! Series 60 UIQ University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi Series 80

Symbian OS • Defined by Symbian consortium • Based on EPOC • Operating system

Symbian OS • Defined by Symbian consortium • Based on EPOC • Operating system for mobile phones – Real-time system – Long uptime required • Multitasking, multithreading University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Problems of Porting • Usual considerations when porting to phone – GUI event driven

Problems of Porting • Usual considerations when porting to phone – GUI event driven program(ming) – Platform specific programming model – Real-time system, exceptions • Application specific porting problems – Number crunching without floating point unit!!! – Signal processing numerically challenging University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Identification System Speech Audio Signal Processing Feature Extraction Speaker Recognition: Classify input speech based

Identification System Speech Audio Signal Processing Feature Extraction Speaker Recognition: Classify input speech based on existing profiles Speaker Modelling: Create speaker profile Feature Vectors Add speaker profiles during training Read and use all profiles during recognition Speaker Profile Database Decision University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

MFCC Signal Processing Digital speech signal frame Preemphasis Time windowing DFT Abs Feature vector

MFCC Signal Processing Digital speech signal frame Preemphasis Time windowing DFT Abs Feature vector Filter bank Log DCT • pre-emph. coeff. 0. 97, Hamm window, 30 triangular mel-filters, base-2 logarithm, output 12 MFCC's University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Fixed-Point Implementation • Numerical analysis needed for fixedpoint arithmetic implementation • Truncation and re-scaling

Fixed-Point Implementation • Numerical analysis needed for fixedpoint arithmetic implementation • Truncation and re-scaling to avoid overflows in the converted algorithm • Minimize information loss caused by computation in fixed-point arithmetic – Minimize relative error University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

FFT, Fixed-Point • Frequency spectrum of speech – Biggest source of numerical error –

FFT, Fixed-Point • Frequency spectrum of speech – Biggest source of numerical error – Butterflies have multiplications – Layers repeat truncation errors • Fixed number of bits per element – 32, native integer size in many systems • Reference implementation: FFTGEN – http: //www. jjj. de/fftgen. tgz University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

FFTGEN (16/16) • Multiplication: 32 x 32 -bit result must fit in 32 bits:

FFTGEN (16/16) • Multiplication: 32 x 32 -bit result must fit in 32 bits: truncate input • FFTGEN: Truncate inputs to 16/16 bits FFT layer input 16 -bit integer X X FFT Twiddle Factor 16 -bit integer 32 -bit multiplication result 16 used bits 16 crop-off bits 16 -bit integer FFT layer output (part of it) Crop-off for next layer: 16 bits! University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Info Preserving FFT (22/10) • Approximate DFT operator F with G • Increase ||F-G||,

Info Preserving FFT (22/10) • Approximate DFT operator F with G • Increase ||F-G||, preserve more signal information – minimize maximum relative error in scaled sine values with respect to scale; 980 good for FFT sizes up to 1024 – Truncate multiplication inputs to 22/10 bits (signal/op) FFT layer input X FFT Twiddle Factor 32 -bit integer 22 used bits 10 crop-off bits 32 -bit integer, 22 bits used 32 -bit multiplication result X 16 -bit integer, 10 bits used FFT layer output (part of it) Crop-off for next layer: 10 bits University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

FFT Spectrum, Fixed-Point • x-axis: fixed-point FFT element abs. values • y-axis: correct FFT

FFT Spectrum, Fixed-Point • x-axis: fixed-point FFT element abs. values • y-axis: correct FFT element abs. values 16/16 abs values 22/10 abs values original TIMIT signal x 4 University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Scale of Error in Proposed FFT Log 10 of relative error in FFT elements

Scale of Error in Proposed FFT Log 10 of relative error in FFT elements 16/16 22/10 average -0. 775 -2. 118 standard deviation 0. 797 0. 590 16/16 22/10 University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Magnitude Spectrum, Fixed-Point • Compute complex absolute values using maximum coordinate and coordinate ratio

Magnitude Spectrum, Fixed-Point • Compute complex absolute values using maximum coordinate and coordinate ratio • Suppose |x| > |y| for z = x + i y, then • Interpret the (squared) y/x by t • Approx. square root by a polynomial P(t) • Constant time algorithm (vs. Newton) University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Logarithm, Fixed-Point • Use base 2 instead of base 10 – corresponds to output

Logarithm, Fixed-Point • Use base 2 instead of base 10 – corresponds to output multiplication • Standard technique: – Return problem to interval [1, 2) – Use linear interpolation from values stored in a look-up table – 8 bits used for indexing the look-up table values University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Rest of System, Fixed-Point • No improvement needed in VQ/GLA • Should apply similar

Rest of System, Fixed-Point • No improvement needed in VQ/GLA • Should apply similar technique as with FFT to other signal processing – Pre-emphasis, utilize full 32 bits – Time windowing, use less bits in windowing function – FB, use less bits in frequency responses – DCT, use less bits for the cosines University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Effect of Signal Processing • TIMIT data sets, varying number of speakers (N) •

Effect of Signal Processing • TIMIT data sets, varying number of speakers (N) • For each N repeat (6 x, 5 x, 2 x) train/recognize cycles (eliminate GLA initial solution randomness) • FFTGEN: FFT with 16/16 multiplication • Fixed-point: use proposed 22/10 FFT • Mixed: floating-point DSP, fixed-point GLA/VQ University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Effect of Signal Quality • GSM/PC data: 16 aligned dual recordings • All computations

Effect of Signal Quality • GSM/PC data: 16 aligned dual recordings • All computations in floating-point arith. • Signal recorded with laptop and PC mic gives average recognition rate 100% • Signal recorded with Nokia 3660 results in average recognition rate 84, 9% University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi

Conclusion • Speaker identification was ported to Symbian Series 60 mobile phone • 22/10

Conclusion • Speaker identification was ported to Symbian Series 60 mobile phone • 22/10 bit usage in multiplication proposed instead of “standard” 16/16 • Experiments indicate that recognition accuracy improves from 68% to 95% University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi