Listening to Normalized Speech Mimicking the Normalization Processes

  • Slides: 8
Download presentation
Listening to Normalized Speech Mimicking the Normalization Processes of Automatic Speech Recognition Dirk Van

Listening to Normalized Speech Mimicking the Normalization Processes of Automatic Speech Recognition Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat. kuleuven. be Katholieke Universiteit Leuven – Dept. ESAT Kasteelpark Arenberg 10, 3001 Heverlee, Belgium www. esat. kuleuven. be/~spch

ASR Preprocessing signal Fourier Transform Magnitude (Spectrogram) Phase Spectrum Envelope (cepstra) Excitation (pitch) normalized

ASR Preprocessing signal Fourier Transform Magnitude (Spectrogram) Phase Spectrum Envelope (cepstra) Excitation (pitch) normalized cepstra normalized pitch removal speaker normalization to ASR Normalized Speech 2

Normalized Speech 3

Normalized Speech 3

Speech Normalization normalized signal original signal Magnitude Spectrum Phase Spectrum Magnitude Spectrum Envelope (spectrum)

Speech Normalization normalized signal original signal Magnitude Spectrum Phase Spectrum Magnitude Spectrum Envelope (spectrum) Excitation (pitch) enhanced spectrum normalized spectrum Phase Spectrum normalized excitation Griffin & Jim, 1984 Normalized Speech 4

Speech Normalization - Ingredients • Spectral normalization – concept: remove vocal tract length effect

Speech Normalization - Ingredients • Spectral normalization – concept: remove vocal tract length effect – method: utterance based VTLN by linear frequency warping • Pitch normalization – concept: remove pitch effect – method: scale utterance based average and variance to global cross-speaker averages • Phase resynthesis – concept: exploit redundancy in over-sampled spectral envelope – method: iterative algorithm (Griffin & Jim, 1984) Normalized Speech 5

original normalized Normalized Speech 6

original normalized Normalized Speech 6

original normalized Normalized Speech 7

original normalized Normalized Speech 7

original inverted Normalized Speech 8

original inverted Normalized Speech 8