TwoStage MelWarped Wiener Filter SNRDependent Waveform Processing Anshu

  • Slides: 20
Download presentation
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing Anshu Agarwal and Yan Ming Cheng, ASRU

Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing Anshu Agarwal and Yan Ming Cheng, ASRU 1999 Human Interface Lab, Motorola Labs, USA Dusan Macho and Yan Ming Cheng, ICASSP 2001 Human Interface Lab, Motorola Labs, USA 2004/08/17 Presented by Chen-Wei Liu

Outline • Introduction • Two-stage Wiener Filter – Formula – Algorithm • SNR Waveform

Outline • Introduction • Two-stage Wiener Filter – Formula – Algorithm • SNR Waveform Processing – Idea – Algorithm • Experiments 2

Introduction • The problem investigated here is that of – speech recognition in an

Introduction • The problem investigated here is that of – speech recognition in an automobile noise environment, where colored noise with intensity as high as or even higher than the input speech is the main characteristic • The performance of conventional speech recognizers – Degrades by more than 50% in typical automobile noise conditions • The automobile noise can be considered as additive – Because it originates from the car’s engine, an opened window, etc. – Many techniques were proposed to subtract the noises from a noise speech signal 3

Introduction • It’s believed that – There is a direct correlation between speech signal

Introduction • It’s believed that – There is a direct correlation between speech signal strength and speech recognition accuracy – The cleaner the signal, the better the performance • This paper proposes an new approach – Based on the Mel-warped Wiener filter concept – Step 1: coarsely reduce the noise and whiten residual noise – Step 2: wipe the residual noise • By exploiting the correlation characteristics between the speech signal and the white noise 4

Formulation of Mel-Wapred Wiener Filter • The noisy signal with additive noise assumption can

Formulation of Mel-Wapred Wiener Filter • The noisy signal with additive noise assumption can be expressed as follows • A Wiener filter is constructed as 5

Formulation of Mel-Wapred Wiener Filter • The mel-warped spectral transfer function of Wiener filter

Formulation of Mel-Wapred Wiener Filter • The mel-warped spectral transfer function of Wiener filter is expressed as – Where m stands for mel-frequency and the warping function • The process of computing the mel-warped power spectrum from an auto-correlation series as Mel-DCT 6

Formulation of Mel-Wapred Wiener Filter • Wiener filtering is performed in the time domain,

Formulation of Mel-Wapred Wiener Filter • Wiener filtering is performed in the time domain, where noisy signal convolves with the impulse response of the Wiener filter • We refer to the process of converting a melwarped transfer function to a time-domain impulse response as inverse Mel-IDCT 7

Two-Stage Filtering • The approach is to adapt the estimate in time – Based

Two-Stage Filtering • The approach is to adapt the estimate in time – Based on a silence-speech detector to capture the evolution of the noise spectrum • First stage – Whitens the noise while preserving the speech spectrum unharmed • Second stage – Wipes out the residual white noise by exploiting the auto-correlation characteristics of white noise 8

Two-Stage Filtering 9

Two-Stage Filtering 9

System Overview 10

System Overview 10

Basic Idea of SWP • The interference noise energy generated by outside sources is

Basic Idea of SWP • The interference noise energy generated by outside sources is relatively constant within the speech period – Therefore, SNR is variable • If we can locate the high SNR period portion and increase its energy or, vice versa… – The overall SNR of given voiced speech segment is enhanced – A front-end based on the SNR-enhanced signal is expected to be more robust 11

Algorithm Description • In SWP, for each frame – A smoothed instant energy contour

Algorithm Description • In SWP, for each frame – A smoothed instant energy contour is first computed • By using Teager energy operator to obtain the instant energy value at each sample • The contour of voiced sounds has quasi-periodic property • For unvoiced sounds, a flatter contour can be observed • Peaks of the smoothed energy contour (maxima) are located by a simple peak-picking strategy – A window function w(n) is applied to each frame – A rectangular unit window of width w is placed between each two adjacent maxima within the frame 12

Waveform within a clean speech 13

Waveform within a clean speech 13

Frame with SNR equal to 0 d. B 14

Frame with SNR equal to 0 d. B 14

Algorithm Description • Next, the portions selected by windowing function are weighted more than

Algorithm Description • Next, the portions selected by windowing function are weighted more than the not selected (low SNR portions) • The original waveform within each frame is modified by the following 15

Relationship between Both • The fundamental weakness is that – The interference noise should

Relationship between Both • The fundamental weakness is that – The interference noise should be sufficiently low to ensure correct maximum • SWP should be applied after 2 MWF, which would have already enhanced the SNR to the adequate level 16

Database • There are two training scenarios in AURORA 2 – MCT : multi-condition

Database • There are two training scenarios in AURORA 2 – MCT : multi-condition training • Using both multiple noise types and SNR levels – CST : clean speech training • Only clean speech is involved in training • Within each training scenarios, 3 kinds of testing are performed – A : data are matched in channel effect and noise type – B : data are matched only in channel effect – C : channel mismatch is introduced 17

Experiment One on SWP 18

Experiment One on SWP 18

Experiment Two on SWP 19

Experiment Two on SWP 19

Experiment Three on SWP 20

Experiment Three on SWP 20