Digital Audio Signal Processing Lecture4 Acoustic Echo Cancellation

Outline • Introduction – Acoustic echo cancellation (AEC) problem & applications – Acoustic channels

Introduction AEC problem/applications Suppress echo – to guarantee normal conversation conditions – to prevent

Introduction AEC standardization ITU-T recommendations (G. 167) on acoustic echo controllers state that –

Introduction Room Acoustics (I) • Propagation of sound waves in an acoustic environment results

Introduction Room Acoustics (II) The linear filter model of the acoustic path between loudspeaker

Introduction Room Acoustics (III) To characterize the ‘reflectivity’ of a recording room the reverberation

Introduction Acoustic Impulse Response : FIR or IIR ? • If the acoustic impulse

Introduction `Conventional’ AEC Techniques • • Directional loudspeakers and microphones Voice controlled switching, loss

Outline • Introduction – Acoustic echo cancellation (AEC) problem & appls – Acoustic channels

Adaptive filtering algorithms for AEC Basic set-up: • Adaptive filter produces a model for

Adaptive filtering algorithms for AEC • Algorithms to be discussed – Normalized LMS –

Adaptive Filtering Algorithms: NLMS • NLMS update equations in which N is the adaptive

Adaptive Filtering Algorithms : NLMS • Pros and cons of NLMS + cheap algorithm

Adaptive Filtering Algorithms • As some input/output delay is acceptable in AEC (cfr ITU.

Adaptive Filtering Algorithms: Block-LMS • To derive the frequency-domain adaptive filter the BLMS algorithm

Adaptive Filtering Algorithms: Block-LMS • Both the BLMS convolution and correlation operation are computationally

Adaptive Filtering Algorithms: FDAF Overlap-save FDAF Will only work if (M is FFT-size) Digital

Adaptive Filtering Algorithms: FDAF ¤ Typical parameter setting for the FDAF : ¤ FDAF

Adaptive Filtering Algorithms: PB-FDAF • Overlap-save PB-FDAF : N-tap full-band filter split into (N/P)

Adaptive Filtering Algorithms: PB-FDAF ¤ Typical parameter setting : ¤ PB-FDAF is intermediate between

Adaptive Filtering Algorithms : PB-FDAF • PS: Instead of a simple stepsize , subband

Adaptive Filtering Algorithms: APA Affine Projection Algorithm =intermediate between RLS and NLMS, complexity- as

Adaptive Filtering Algorithms: APA Problem with APA : near-end noise amplification is echo-signal is

Adaptive Filtering Algorithms: APA Effect on near-end noise amplification Smaller if more regularization Effect

Adaptive Filtering Algorithms: Fast-APA complexity, i. e. O(P. N), may be reduced to (roughly)

Control Algorithm • Adaptation speed ( ) should be adjusted… – to the far-end

Control Algorithm 3 modes of operation: 1. Near-end activity (single or double talk) (Ed

Control Algorithm Double-talk Detection (DTD) • Difficult problem: detection of speech during speech •

Control Algorithm Energy-based DTD Compare short-time energy of far-end and near-end channel Ex and

Post-processing • Error suppression obtained in practice will be limited to +/- 30 d.

Loudspeaker Non-linearity If loudspeaker non-linearity is significant (e. g. consumer applications), then this should

Loudspeaker Non-linearity • Solution-2: Inverse non-linear model in forward path Advantage = if successful,

S-AEC Problem Statement Multi-microphone/multi-loudspeaker systems : complexity for ‘prewhitening’ (APA, RLS) of x can

S-AEC Problem Statement Conditioning Problem: S-AEC input vectors are Mono : autocorrelation of x-signal

S-AEC Problem Statement Non-uniqueness Problem: Consider transmission room impulse responses G 1, G 2

S-AEC Problem Statement In practice : Hence So that X will be (only) ill-conditioned

S-AEC Fixes -Reduce correlation between the loudspeaker signals by… • Complementary comb filters •

S-AEC Fixes: Complementary comb filters Comb-1 for x 1, comb-2 for x 2 Two

S-AEC Fixes: Noise insertion Remove all signal content below the masking threshold Fill with

S-AEC Fixes: Non-linear processing is often a half wave rectifier is necessary for good

S-AEC Fixes: Non-linear processing Loudspeakers play original signal Mismatch Loudspeakers play processed signal Time

Slides: 44

Download presentation

Digital Audio Signal Processing Lecture-4: Acoustic Echo Cancellation Marc Moonen Dept. E. E. /ESAT-STADIUS, KU Leuven marc. moonen@esat. kuleuven. be homes. esat. kuleuven. be/~moonen/

Outline • Introduction – Acoustic echo cancellation (AEC) problem & applications – Acoustic channels • Adaptive filtering algorithms for AEC – NLMS – Frequency domain adaptive filters – Affine projection algorithm (APA) • • Control algorithm Post-processing Loudspeaker non-linearity Stereo AEC Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 2

Introduction AEC problem/applications Suppress echo – to guarantee normal conversation conditions – to prevent the closed-loop system from becoming unstable Applications – Teleconferencing – Hands-free telephony – Handsets – … Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 3

Introduction AEC standardization ITU-T recommendations (G. 167) on acoustic echo controllers state that – Input/output delay of the AEC should be smaller than 16 ms – Far-end signal suppression should reach 40. . 45 d. B (depending on application), if no near-end signal is present – In presence of near-end signals the suppression should be at least 25 d. B – Many other requirements … Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 4

Introduction Room Acoustics (I) • Propagation of sound waves in an acoustic environment results in – signal attenuation – spectral distortion • Propagation can be modeled quite well as a linear filtering operation • Non-linear distortion mainly stems from the loudspeakers. This is often a second order effect and mostly not taken into account explicitly Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 5

Introduction Room Acoustics (II) The linear filter model of the acoustic path between loudspeaker and microphone is represented by the acoustic impulse response Observe that : – First there is a dead time – Then come the direct path impulse and some early reflections, which depend on the geometry of the room – Finally there is an exponentially decaying tail called reverberation, coming from multiple reflections on walls, objects, . . . Reverberation mainly depends on ‘reflectivity’ (rather than geometry) of the room… Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 6

Introduction Room Acoustics (III) To characterize the ‘reflectivity’ of a recording room the reverberation time ‘RT 60’ is defined – RT 60 = time which the sound pressure level or intensity needs to decay to -60 d. B of its original value – For a typical office room RT 60 is between 100 and 400 ms, for a church RT 60 can be several seconds ESAT speech laboratory : Begijnhofkerk Leuven : RT 60 120 ms RT 60 3730 ms Original speech signal : Acoustic room impulse responses are highly time-varying !!!! Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 7

Introduction Acoustic Impulse Response : FIR or IIR ? • If the acoustic impulse response is modeled as – an FIR filter many hundreds to several thousands of filter taps are needed – an IIR filter order can be reduced, but still hundreds of filter coeffs (num. + denom. ) may be needed (sigh!) • Hence FIR models are typically used in practice because. . . – these are guaranteed to be stable – in a speech comms set-up the acoustics are highly time-varying, hence adaptive filtering techniques are called for (see DSP-CIS): • FIR adaptive filters : simple adaptation rules, no local minima, . . • IIR adaptive filters : more complex adaptation, local minima Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 8

Introduction `Conventional’ AEC Techniques • • Directional loudspeakers and microphones Voice controlled switching, loss control Howling control : stability margin improvement of the closed loop by – frequency shifting – using comb filters – removing resonant peaks Non-linear post-processing, e. g. center clipping Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 9

Outline • Introduction – Acoustic echo cancellation (AEC) problem & appls – Acoustic channels • Adaptive filtering algorithms for AEC – NLMS – Frequency domain adaptive filters – Affine projection algorithm (APA) • • Control algorithm Post-processing Loudspeaker non-linearity Stereo AEC Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 10

Adaptive filtering algorithms for AEC Basic set-up: • Adaptive filter produces a model for acoustic room impulse response + an estimate of the echo contribution in microphone signal, which is then subtracted from the microphone signal • Thanks to adaptivity – time-varying acoustics can be tracked – performance superior to performance of `conventional’ techniques Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 11

Adaptive filtering algorithms for AEC • Algorithms to be discussed – Normalized LMS – Frequency-domain adaptive filter (FDAF) & partitioned block freq-domain adaptive filter (PB-FDAF) – Affine projection algorithm (APA) & fast affine projection algorithm Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 12

Adaptive Filtering Algorithms: NLMS • NLMS update equations in which N is the adaptive filter length, is the adaptation stepsize, is a regularization parameter and k is the discrete-time index Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 13

Adaptive Filtering Algorithms : NLMS • Pros and cons of NLMS + cheap algorithm : O(N) + small input/output delay (= 1 sample) – for colored far-end signals (such as speech) convergence of the NLMS algorithm is slow (cfr lambda_max versus lambda_min, etc…. , see DSP-CIS) – large N then means even slower convergence ¤ NLMS is thus often used for the cancellation of short echo paths Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 14

Adaptive Filtering Algorithms • As some input/output delay is acceptable in AEC (cfr ITU. . ), algorithms can be derived that are even cheaper than NLMS, by exchanging implementation cost for extra processing delay, sometimes even with improved performance : • Frequency-domain adaptive filtering (FDAF) • Partitioned Block FDAF (PB-FDAF) + cost reduction + optimal (stepsize) tuning for each subband/frequency bin separately results in improved performance Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 15

Adaptive Filtering Algorithms: Block-LMS • To derive the frequency-domain adaptive filter the BLMS algorithm is considered first in which N is # filter taps, L is block length, n is block time index Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 16

Adaptive Filtering Algorithms: Block-LMS • Both the BLMS convolution and correlation operation are computationally demanding. They can be implemented more efficiently in the frequency domain using fast convolution techniques, i. e. overlap-save/overlap-add : convolution overlap-save correlation with DFT matrix Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 17

Adaptive Filtering Algorithms: FDAF Overlap-save FDAF Will only work if (M is FFT-size) Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 18

Adaptive Filtering Algorithms: FDAF ¤ Typical parameter setting for the FDAF : ¤ FDAF is functionally equivalent to BLMS + FDAF is significantly cheaper than (B)LMS for a typical parameter setting If N=1024 : (=estimate only, in practice <20) - Input/output delay is equal to 2 L-1=2 N-1, which may be unacceptably large for realistic parameter settings : e. g. if N=1024 and fs=8000 Hz delay is 256 ms ! Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 19

Adaptive Filtering Algorithms: PB-FDAF • Overlap-save PB-FDAF : N-tap full-band filter split into (N/P) filter sections, P-taps each, then apply overlap-save to each section, etc. (`P takes the place of N’). Will only work if Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 20

Adaptive Filtering Algorithms: PB-FDAF ¤ Typical parameter setting : ¤ PB-FDAF is intermediate between LMS and FDAF (P/N=1) + PB-FDAF is functionally equivalent to BLMS + PB-FDAF is cheaper than LMS : If N=1024, P=L=128, M=256 : (estimate) + Input/output delay is 2 L-1 which can be chosen small, in the example above the delay is 32 ms, if fs=8000 Hz ¤ used in commercial AECs Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 21

Adaptive Filtering Algorithms : PB-FDAF • PS: Instead of a simple stepsize , subband dependent stepsizes can be applied – stepsizes dependent on the subband energy (`subband normalization’) – convergence speed increased at only a small extra cost • PS: PB-FDAF algorithm can be simplified by leaving out of the weight updating equation (=`unconstrained updating’) Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 22

Adaptive Filtering Algorithms: APA Affine Projection Algorithm =intermediate between RLS and NLMS, complexity- as well as performance-wise NLMS (delta=0) : APA : if =1 a-posteriori error is 0 P last a-posteriori errors are 0 Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 23

Adaptive Filtering Algorithms: APA Problem with APA : near-end noise amplification is echo-signal is near-end noise orthogonal contains sorted singular values on diagonal , multiplied by , appears as `noise in the filter weights ’ Solution : replace by in update formula (=`regularization’, similar to delta in NLMS-formula) Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 24

Adaptive Filtering Algorithms: APA Effect on near-end noise amplification Smaller if more regularization Effect on adaptation speed Slower if more regularization Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 25

Adaptive Filtering Algorithms: Fast-APA complexity, i. e. O(P. N), may be reduced to (roughly) LMS complexity, i. e. O(N) : 1. `Recursive ’ error vector calculation (delta=0) : Ignore steps 2 & 3 Ex: mu=1, then lower components were already nulled @ time k-1 2. Delayed filter vector update : accumulate filter adaptations based on vector x_k, apply only when x_k `leaves ’ the X_k matrix (at time k+P-1) 3. Recursive updating scheme for inverse in Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 26

Control Algorithm • Adaptation speed ( ) should be adjusted… – to the far-end signal power, in order to avoid instability of the adaptive filter stepsize normalization (e. g. NLMS) – to the amount of near-end activity, in order to prevent the filter to move away from the optimal solution (see DSP-II) double-talk detection Double talk refers to the situation where both the far-end and the near-end speaker are active. Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 28

Control Algorithm 3 modes of operation: 1. Near-end activity (single or double talk) (Ed large) FILT 2. No near-end activity, only far-end activity (Ex large, Ed small) FILT+ADAPT 3. No near-end activity, no far-end activity (Ex small, Ed small) NOP • Ex is short-time energy of the far-end signal (p. 36) • Ed is short-time energy of the desired signal Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 29

Control Algorithm Double-talk Detection (DTD) • Difficult problem: detection of speech during speech • Desired properties – Limited number of false alarms – Small delay – Low complexity • Different approaches exist in the literature which are based on – – Energy Correlation Spectral contents … Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 30

Control Algorithm Energy-based DTD Compare short-time energy of far-end and near-end channel Ex and Ed : – Method 1 : If Ed > Ex double talk is a well-chosen threshold – Method 2 : If > 1 double talk Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 31

Post-processing • Error suppression obtained in practice will be limited to +/- 30 d. B, due to – – – non-linearities in the signal path (loudspeakers) time-variations of the acoustic impulse responses finite length of the adaptive filter local background noise failing double-talk detection … • A post-processing unit is added to further reduce the residual signal, e. g. `center clipping’ Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 32

Loudspeaker Non-linearity If loudspeaker non-linearity is significant (e. g. consumer applications), then this should be compensated for • Solution-1: Non-linear model (fixed) in cancellation path x Non-linear model Adaptive filter y d e Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 33

Loudspeaker Non-linearity • Solution-2: Inverse non-linear model in forward path Advantage = if successful, also improves loudspeaker characteristic/sound quality. . x Inverse non-linear model Adaptive filter y d e Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 34

S-AEC Problem Statement Multi-microphone/multi-loudspeaker systems : complexity for ‘prewhitening’ (APA, RLS) of x can be shared amongst microphone channels. Apart from this, different microphone signals are processed independently Hence from now on consider S-AEC on one microphone only. Other microphone(s) similarly (but independently) processed Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 36

S-AEC Problem Statement Conditioning Problem: S-AEC input vectors are Mono : autocorrelation of x-signal (e. g. speech) has an impact on convergence (see DSP-CIS) Stereo : also cross-correlation between signals x 1 and x 2 plays a role now… Large(r) eigenvalue spread (large(r) condition number) of correlation matrix -> large(r) impact on convergence ! Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 37

S-AEC Problem Statement Non-uniqueness Problem: Consider transmission room impulse responses G 1, G 2 (length Q) Assume then : Hence filter input data matrix X will be singular (with `null-space’) -> LS solution non-unique, and solutions depend on (changes in) transmission room (G 1, G 2) ! Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 38

S-AEC Problem Statement In practice : Hence So that X will be (only) ill-conditioned (instead of rank-deficient) which however is still bad news… Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 39

S-AEC Fixes -Reduce correlation between the loudspeaker signals by… • Complementary comb filters • White noise insertion (naive solution - large distortion) • Colored (masked) noise insertion • Non-linear processing Disadvantages : • Signal distortion • Stereo perception may be affected -In addition : use algorithms that are less sensitive to the condition number than NLMS, e. g. RLS, APA, . . . Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 40

S-AEC Fixes: Complementary comb filters Comb-1 for x 1, comb-2 for x 2 Two channels are decorrelated, BUT stereo image is distorted if applied below 1 k. Hz (=psycho-acoustics) Can be combined with another technique below 1 k. Hz Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 41

S-AEC Fixes: Noise insertion Remove all signal content below the masking threshold Fill with noise (both channels independently) Correlation between input channels decreases • Poor performance for speech • Good performance for music • Computationally intensive Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 42

S-AEC Fixes: Non-linear processing is often a half wave rectifier is necessary for good performance, but audible Good results for speech, audible artifacts in music Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 43

S-AEC Fixes: Non-linear processing Loudspeakers play original signal Mismatch Loudspeakers play processed signal Time Digital Audio Signal Processing Version 2013 -2014 Lecture-4: Acoustic Echo Cancellation p. 44