Listening to Normalized Speech Mimicking the Normalization Processes Slides: 8 Download presentation Listening to Normalized Speech Mimicking the Normalization Processes of Automatic Speech Recognition Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat. kuleuven. be Katholieke Universiteit Leuven – Dept. ESAT Kasteelpark Arenberg 10, 3001 Heverlee, Belgium www. esat. kuleuven. be/~spch ASR Preprocessing signal Fourier Transform Magnitude (Spectrogram) Phase Spectrum Envelope (cepstra) Excitation (pitch) normalized cepstra normalized pitch removal speaker normalization to ASR Normalized Speech 2 Normalized Speech 3 Speech Normalization normalized signal original signal Magnitude Spectrum Phase Spectrum Magnitude Spectrum Envelope (spectrum) Excitation (pitch) enhanced spectrum normalized spectrum Phase Spectrum normalized excitation Griffin & Jim, 1984 Normalized Speech 4 Speech Normalization - Ingredients • Spectral normalization – concept: remove vocal tract length effect – method: utterance based VTLN by linear frequency warping • Pitch normalization – concept: remove pitch effect – method: scale utterance based average and variance to global cross-speaker averages • Phase resynthesis – concept: exploit redundancy in over-sampled spectral envelope – method: iterative algorithm (Griffin & Jim, 1984) Normalized Speech 5 original normalized Normalized Speech 6 original normalized Normalized Speech 7 original inverted Normalized Speech 8