EE Dept IIT Bombay M Tech project EE
EE Dept. , IIT Bombay M. Tech. project, EE Dept. , IIT Bombay, Jun. 2013. Real-time Enhancement of Noisy Speech Using Spectral Subtraction Santosh K. Waddi (10307932) wsantosh@ee. iitb. ac. in Supervisor: Prof. P. C. Pandey IIT Bombay June 2013
EE Dept. , IIT Bombay Overview 1. Introduction 2. Speech Enhancement Using Spectral Subtraction 3. Investigations Using Offline Processing 4. Implementation for Real-time Processing wsantosh@ee. iitb. ac. in 5. Summary & Conclusions 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 2/32
EE Dept. , IIT Bombay 1. Introduction Sensorineural hearing loss – Increased hearing thresholds and high frequency loss – Decreased dynamic range & abnormal loudness growth – Reduced speech perception due to increased spectral & temporal masking → Decreased speech intelligibility in noisy environment wsantosh@ee. iitb. ac. in Signal processing in hearing aids – Frequency selective amplification – Automatic volume control – Multichannel dynamic range compression (settable attack time, release time, and compression ratios) Processing for reducing the effect of increased spectral masking in sensorineural loss – Binaural dichotic presentation (Lunner et al. 1993, Kulkarni et al. 2012) – Spectral contrast enhancement (Yang et al. 2003) – Multiband frequency compression (Arai et al. 2004, Kulkarni et al. 2012) 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 3/32
EE Dept. , IIT Bombay Techniques for reducing the background noise – Directional microphone – Adaptive filtering (a second microphone needed for noise reference) – Single-channel noise suppression using spectral subtraction wsantosh@ee. iitb. ac. in (Boll 1979, Berouti et al. 1979, Martin 1994, Kamath & Loizou 2002, Loizou 2007, Lu & Loizou 2008, Paliwal et al. 2010) Processing steps • Dynamic estimation of non-stationary noise spectrum - During non-speech segments using voice activity detection - Continuously using statistical techniques • Estimation of noise-free speech spectrum - Spectral noise subtraction - Multiplication by noise suppression function • Speech resynthesis (using enhanced magnitude and noisy phase) 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 4/32
Real-time single-input speech enhancement for use in hearing aids and other sensory aids (cochlear prostheses, etc) for hearing impaired listeners and in communication devices Main challenges • Noise estimation without voice activity detection to avoid errors under low-SNR & during long speech segments • Low signal delay(algorithmic + computational) for real-time application • Low computational complexity & memory requirement for implementation on a low-power processor wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Research objective 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 5/32
• Dynamic estimation of non-stationary noise spectrum • Estimation of noise-free speech spectrum • Speech resynthesis wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay 2. Speech Enhancement Using Spectral Subtraction 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 6/32
EE Dept. , IIT Bombay Generalized spectral subtraction (Berouti et al. 1979) Power subtraction Windowed speech spectrum = Xn(k) Estimated noise mag. spectrum = Dn(k) Estimated speech spectrum Yn(k) = [|Xn(k)|2 – (Dn(k))2 ] 0. 5 e j<Xn(k) Problems: residual noise due to under-subtraction, distortion in the form of musical noise & clipping due to over-subtraction. |Yn(k)| = [ |Xn(k)|γ – α(Dn(k))γ ] 1/γ, wsantosh@ee. iitb. ac. in β 1/γ Dn(k) if |Xn(k)| > (α + β)1/γDn(k) otherwise γ = exponent factor (2: power subtraction, 1: magnitude subtraction) α = over-subtraction factor (for limiting the effect of short-term variations in noise spectrum) β = floor factor to mask the musical noise due to over-subtraction Re-synthesis with noisy phase without explicit phase calculation Yn(k) = |Yn(k)| Xn(k) / |Xn(k)| 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 7/32
EE Dept. , IIT Bombay wsantosh@ee. iitb. ac. in Multi-band spectral subtraction (Kamath & Loizou 2002) • Noise does not effect spectrum uniformly • Speech spectrum divided into B non-overlapping bands, spectral subtraction is performed independently • Test material: 10 sentences from HINT database, noise: speechshaped noise, Noisy speech: 0 d. B and 5 d. B SNR • Evaluation: Itakura-Saito (IS) distance method as an objective measure • Improvement over the conventional power spectral subtraction, a very little trace of musical noise Geometric approach to spectral subtraction (Lu & Loizou 2008) • Without assuming the cross-terms as zero • Test material: NOIZEUS database, noise: babble, street, car, white, Noisy speech: 0 d. B, 5 d. B and 15 d. B SNR • Evaluation: mean square error (MSE), PESQ, log likelihood ratio • Cross terms can be ignored at very low and high SNRs but not near to 0 d. B • Proc. output: no audible musical noise , smooth and pleasant residual noise 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 8/32 • Performed significantly better than power spectral subtraction in all
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Noise estimation Minimal-tracking algorithms Minimum statistics (Martin 1994) • Tracks the noise as minima of past frames Minimum tracking (Doblinger 1995) • Smoothing noisy speech power spectra in each frequency bin using a non-linear smoothing Time-recursive averaging algorithms SNR-dependent recursive averaging (Lin 2003) • Noisy speech decomposed into sub-band signals, noisy signal power is smoothened and noise estimated adaptively • Smoothing parameter: function of estimated SNR Weighted spectral averaging (Hirsch and Ehrlischer 1995) • First order recursive weighted average of past spectral magnitude values over 400 ms which are below an adaptive threshold 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 9/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Improved minima-controlled recursive averaging (Cohen 2007) • Two iterations of smoothing and tracking • First iteration: rough voice activity detection is provided in each frequency band • Second iteration: smoothing excludes strong speech components, makes the minimum tracking robust during speech activity • Smoothing parameter: frequency-dependent & dynamically adjusted by signal presence probability • Lower estimator error than minimum statistics • Method is combined with log-spectral amplitude estimator • Higher segmental SNR improvement than minimum statistics Histogram-based technique (Hirsch & Ehrlicher 1995) • Histogram: noisy speech over 400 ms • Noise estimated: maximum of distribution in each sub-band • Avoid spikes: estimated values smoothed along time axis • Objective evaluation: relative error • Relative error is low compared to weighted spectral average method (Hirsch & Ehrlicher 1995)3. Offline Inv. 4. Real-time proc. 5. Summ. 1. Intro. 2. Speech Enhan. 10/32
EE Dept. , IIT Bombay wsantosh@ee. iitb. ac. in Quantile-based noise estimation (Stahl 2000) • Speech signal energy: low in most of the frames high in only 10 – 20 % frames • Noise estimation: Selecting certain quantile value from previous frames of noisy speech spectrum • Frequency-dependent and SNR-dependent for quantile selection • Median-based noise estimation work well in robust manner Cascaded-median based estimation (Basha & Pandey 2012) Moving median approximated by p-point q-stage cascaded-median, with a saving in memory & computation for real-time implementation. 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 11/32
EE Dept. , IIT Bombay MBNE vs CMBNE Comparison Median Storage per freq. bin M-point p-pont q-stage (M = pq) 2 M pq No of sortings per frame per freq. bin (M– 1)/2 p(p– 1)/2 Condition for reducing sorting operations and storage: low p, q ≈ ln(M) p = 3 → code simplification for sorting operations wsantosh@ee. iitb. ac. in Project objective Implementation of generalized spectral subtraction along with cascaded-median based noise estimation for real-time processing using a low-power DSP • Selection of optimal set of processing steps and parameters, using offline processing • Implementation on a DSP board with a 16 -bit fixed-point processor 1. Intro. & 2. evaluation Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 12/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay 3. Investigations Using Offline Processing Test material • Speech material 1: Recording with three isolated vowels, a Hindi sentence, an English sentence (-/a/-/i/-/u/– “aayiye aap kaa naam kyaa hai? ” – “Where were you a year ago? ”) from a male speaker. Referred to as "VHSES" • Speech material 2: Six sentences from NOIZEUS database of one male speaker • Noise: white, pink, street, babble, car, and train noises. • SNR: ∞, 18, 15, 12, 9, 6, 3, 0, -3, -6 d. B. Evaluation methods • Informal listening • Objective evaluation using PESQ measure (0 – 4. 5) Investigations (fs = 10 k. Hz) • Overlap of 50% & 75% : indistinguishable outputs • γ = 1 (magnitude subtraction) : higher tolerance to variation in α, β 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 13/32
EE Dept. , IIT Bombay Investigation on noise estimation Scatter plots for magnitude spectra. Speech material: VHSES wsantosh@ee. iitb. ac. in (a) Clean speech signal (c) Noisy speech: white noise, 3 d. B SNR (b) White noise (d) Noisy speech: white noise, 0 d. B SNR 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 14/32
EE Dept. , IIT Bombay Scatter plots for magnitude spectra. Speech material: NOIZEUS wsantosh@ee. iitb. ac. in (a) Clean speech signal (c) Noisy speech: white noise, 3 d. B SNR 1. Intro. 2. Speech Enhan. 3. Offline Inv. (b) White noise (d) Noisy speech: white noise, 0 d. B SNR 4. Real-time proc. 5. Summ. 15/32
EE Dept. , IIT Bombay Mean, median and minimum of magnitude spectra of clean speech signal, noise and noisy speech (white, SNR: 0 d. B), speech material: VHSES wsantosh@ee. iitb. ac. in (a) Mean (b) Median • Noisy signal median tracks the noise median & Noisy signal minimum tracks the noise minimum at almost all the frequencies (c) Minimum 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 16/32
EE Dept. , IIT Bombay Mean, median and minimum of magnitude spectra of clean speech signal, noise and noisy speech (white, SNR: 0 d. B), speech material: NOIZEUS (b) Median wsantosh@ee. iitb. ac. in (a) Mean (c) Minimum 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 17/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Relative RMS error (d. B) • Objective evaluation of the accuracy of noise estimation • Relative RMS error (d. B) decreases as SNR decreases (a) Speech material: VHSES (b) Speech material: NOIZEUS 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 18/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Effect of window length and noise estimation duration • Processing: Magnitude spectral subtraction with median based noise estimation • High PESQ score: Noise estimation across 81 past frames & 20 – 40 ms window length • 30 ms window length was chosen (approximately 1. 2 s duration) (a) Speech: VHSES, noise: white, SNR: 0 d. B (b) Speech: NOIZEUS, noise: white, SNR: 0 d. B 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 19/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Comparison of enhanced speech using MBNE and CMBNE • MBNE requires large memory & computation intensive • 3 -point 4 -stage cascaded-median significantly reduces memory requirement & computations • Reduction in storage requirement per freq. bin: from 162 to 12 samples • Reduction in number of sorting operations per frame per freq. bin: from 40 to 3 • Information listening: Perceptually same PESQ score of the enhanced speech. Speech: NOIZEUS, • Objective evaluation: Almost same in most cases and maximum SNR: 0 d. B difference Noise typeof 0. 06 Un proc. Proc. using MBNE CMBNE white 1. 55 1. 84 babble 1. 75 1. 80 1. 81 street 1. 83 2. 08 2. 04 pink 1. 60 2. 00 1. 98 train 2. 05 2. 40 2. 35 car 1. 72 1. 95 1. 89 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 20/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Effect of spectral subtraction parameters • Processing: Magnitude spectral subtraction using 3 -point 4 -stage CMBNE • Analysis-synthesis: 30 ms window length & 50% overlap • Spectral floor factor β : 0. 01 appropriate for all the cases • Subtraction factor α: in 2 – 2. 5 for VHSES and in 1. 2 – 1. 4 for NOIZEUS speech material Phase estimation for spectral subtraction • Processing: Magnitude spectral subtraction using 3 -point 4 -stage CMBNE • Phase: zero, Cepstrum 1978, Quatieri & Oppenheim 1981, Nawab et al. 1983 • Analysis-synthesis: 50% overlap rect. win. , 75% overlap rect. win. , Griffin-Lim method (Griffin & Lim 1984) • Informal listening: No improvement over by using phase using noisy phase • Objective evaluation: signal estimated using noisy phase has higher PESQ score 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 21/32
EE Dept. , IIT Bombay Comparison of proposed method with other methods • Proposed method: Magnitude spectral subtraction with cascadedmedian based noise estimation. Analysis-synthesis with 30 ms and 50% overlap • Comparison: spectral-subtractive, statistical-model based, and subspace algorithms (implementations available on CD accompanying Loizou 2007: specsub, mband, ga, wiener_iter, wiener_as, wiener_wt, mt_mask, audnoise, mmse, logmmse_spu, stsa_weuchild, stsa_wcosh, stsa_mis, Comparison of PESQ scores for VHSES speech material, 0 kli, pklt) wsantosh@ee. iitb. ac. in d. B SNR Noise type Enhancement method white babble street pink train car un proc. 1. 54 1. 73 1. 78 1. 59 2. 00 1. 67 specsub 1. 78 1. 74 1. 85 2. 01 2. 33 1. 91 mband 1. 43 1. 88 2. 06 1. 72 2. 61 2. 09 ga 1. 82 2. 05 2. 42 2. 11 2. 67 2. 26 mmse 1. 95 1. 86 1. 99 2. 22 2. 69 2. 24 klt 2. 51 1. 89 1. 84 2. 27 2. 32 1. 91 Proposed method 2. 14 1. 93 2. 16 2. 31 2. 63 2. 15 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 22/32
EE Dept. , IIT Bombay wsantosh@ee. iitb. ac. in Comparison of PESQ scores for NOIZEUS speech material, 0 d. B SNR Noise type Enhancement method white babble street pink train car un proc. 1. 55 1. 75 1. 83 1. 60 2. 05 1. 72 specsub 1. 63 1. 58 1. 81 1. 78 2. 04 1. 77 mband 1. 59 1. 84 2. 02 1. 73 2. 30 1. 91 ga 1. 61 1. 82 2. 07 1. 86 2. 35 1. 96 mmse 1. 82 1. 78 2. 02 2. 01 2. 43 1. 99 klt 1. 89 1. 71 1. 87 1. 97 2. 07 1. 78 Proposed method 1. 83 1. 81 2. 03 1. 95 2. 33 1. 90 • Observation: Comparable to the best ones 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 23/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Discussion • FFT length N = 512 & higher: indistinguishable outputs • Processing: Magnitude spectral subtraction with 3 -point 4 -stage CMBNE, analysis-synthesis with 30 ms window length & 50% overlap • Informal listening: Significant enhancement for all noises with different SNR's • Spectral subtraction parameters: β = 0. 01 appropriate for all the cases, α in 2 – 2. 5 for VHSES and in 1. 2 – 1. 4 for NOIZEUS speech material Material: VHSES Material: NOIZEUS Noise type • SNR advantage: 4 – 13 d. B for VHSES & 2 SNR – 7 d. B for NOIZEUS SNR Optimal α advantage speech materialadvantage white 13 d. B 2. 0 7 d. B 1. 4 babble 4 d. B 2. 0 2 d. B 1. 2 street 5 d. B 2. 5 4 d. B 1. 4 pink 11 d. B 2. 0 7 d. B 1. 4 train 7 d. B 2. 0 5 d. B 1. 4 car 6 d. B 2. 0 5 d. B 1. 4 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 24/32
EE Dept. , IIT Bombay 4. Implementation for Real-time Processing 16 -bit fixed point DSP: TI/TMS 320 C 5515 • 16 MB memory space : 320 KB on-chip RAM with 64 KB dual access RAM, 128 KB on-chip ROM • Three 32 -bit programmable timers, 4 DMA controllers each with 4 channels • FFT hardware accelerator (8 to 1024 -point FFT) • Max. clock speed: 120 MHz DSP Board: e. Zdsp wsantosh@ee. iitb. ac. in • 4 MB on-board NOR flash for user program • Codec TLV 320 AIC 3204: stereo ADC & DAC, 16/20/24/32 -bit quantization , 8 – 192 k. Hz sampling • Development environment for C: TI's 'CCStudio, ver. 4. 0' 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 25/32
• One codec channel (ADC and DAC) with 16 -bit quantization • Sampling frequency: 10 k. Hz • Window length of 30 ms (L = 300) with 50% overlap, FFT length N = 512 • Storage of input samples, spectral values, processed samples: 16 -bit real & 16 -bit imaginary parts wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Implementation 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 26/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Data transfers and buffering operations (S = L/2) 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. DMA cyclic buffers – 3 block input buffer – 2 block output buffer (each with S samples) Pointers – current input block – just-filled input block – current output block – write-to output block (incremented cyclically on DMA 5. Summ. 27/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Results PESQ Score vs SNR for noisy and enhanced speech using offline and real-time processing (a) Speech: VHSES (b) Speech: NOIZEUS • Offline proc. improvement: 0. 57 – 0. 80 for VHSES & 0. 28 – 0. 44 for NOIZEUS • Real-time proc. improvement: 0. 39 – 0. 71 for VHSES & 0. 22 – 0. 32 for NOIZEUS 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 28/32
EE Dept. , IIT Bombay Example of Processing : "-/a/-/i/-/u/– "aayiye aap kaa naam kyaa hai? " – "Where were you a year ago? ", with white noise at 3 d. B SNR (a) Clean speech (b) Noisy speech (d) Real-time processed wsantosh@ee. iitb. ac. in (c) Offline processed 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 29/32
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Comparison of enhanced speech between offline and real-time processed. Speech: VHSES, SNR: 0 d. B Noise type Un proc. Offline proc. Real-time proc. white 1. 54 2. 10 babble 1. 73 1. 93 1. 87 street 1. 78 2. 16 1. 92 pink 1. 59 2. 31 2. 20 train 2. 00 2. 63 2. 45 car 1. 67 2. 15 2. 09 • Real-time processing tested using white, babble, car, pink, train noises: real-time processed output perceptually similar to the offline processed output • Signal delay = 48 ms • Lowest clock for satisfactory operation = 16. 4 MHz → 1. Intro. 2. Speech Enhan. 3. Offline Inv. of 4. the Real-time proc. 5. with Summ. highest 30/32 Processing capacity used ≈ 1/7 capacity
EE Dept. , IIT Bombay 5. Summary & Conclusions • Investigation & implementation of spectral subtraction for realtime operation: Magnitude spectrum subtraction and resynthesis using noisy phase, along with cascaded-median based dynamic noise estimation for reducing computation and memory requirement wsantosh@ee. iitb. ac. in • Enhancement of speech with different types of additive stationary and non-stationary noise: SNR advantage : 4 – 13 d. B for VHSES & 2 – 7 d. B for NOIZEUS • Implementation for real-time operation using 16 -bit fixed-point processor TI/TMS 320 C 5515: Implementation with 10 k. Hz sampling using 1/7 of processing capacity, signal delay = 48 ms • Further work – Frequency & a posteriori SNR-dependent subtraction & spectral floor factors – Combination of speech enhancement technique with other processing techniques in the sensory aids – Implementation using other processors 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 31/32 – Subjective evaluation of intelligibility and quality of enhanced
EE Dept. , IIT Bombay wsantosh@ee. iitb. ac. in Thank You 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 32/32
Sensorineural loss is generally associated with increased spectral masking due to widened auditory filters and the listeners having this kind of hearing impairment often experience great difficulty when the speech is contaminated by noise. This thesis presents investigations for real-time enhancement of noisy speech using spectral subtraction for suppressing the external noise. Investigation using offline processing for enhancing the noisy speech with different types of noise and SNR values is carried out to select the optimal set of steps and parameters for real-time processing. PESQ score is used for objective comparison of quality of the enhanced speech. Results show that median based noise estimation is effective in estimating noise from noisy speech without a voice activity detector, for a wide variety of stationary and non-stationary noises and range of SNR values and that a cascaded-median can be used as an approximation to median for significantly reducing the computation and memory requirement, without adversely affecting the noise estimation. Speech enhancement using magnitude spectrum subtraction with 3 -point 4 -stage cascaded median for noise estimation and resynthesis using noisy phase resulted in improvements in PESQ scores in the range 0. 28 – 0. 44 for speech material from NOIZEUS database with added white noise. Resynthesis using phase estimated from the enhanced magnitude spectrum did not result in any further improvement in the scores. The processing technique is implemented and tested for satisfactory operation, with sampling frequency of 10 k. Hz, 30 ms analysis window with 50% overlap, using a DSP board based on 16 -bit fixed-point DSP processor TMS 320 C 5515 with on-chip FFT hardware. The implementation uses data transfer and buffering operations devised for an efficient realization of analysis-synthesis and codec and DMA for acquisition of the input signal and outputting of the processed output signal. The real-time operation is achieved with signal delay of approximately 48 ms and using about one-seventh of the computing 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 33/32 capacity of the processor. wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay Abstract
wsantosh@ee. iitb. ac. in EE Dept. , IIT Bombay References [1] H. Levitt, J. M. Pickett, and R. A. Houde, Eds. , Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980, pp. 3– 10. [2] J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass. : Allyn Bacon, 1999, pp. 289– 323. [3] H. Dillon, Hearing Aids. New York: Thieme Medical, 2001. [4] B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66– 107. [5] T. Lunner, S. Arlinger, and J. Hellgren, “ 8 -channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes, ” Scand. Audiol. Suppl. , vol. 38, pp. 75– 81, 1993. [6] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss, ” Int. J. Audiol. , vol. 51, no. 4, pp. 334– 344, 2012. [7] J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons, ” Speech Commun. , vol. 39, no. 1– 2, pp. 33– 46, 2003. [8] T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners, ” in Proc. 18 th Int. Cong. Acoust. (ICA 2004), Kyoto, Japan, 2004 pp. 1389– 1392. [9] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, "Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss, " Speech Commun. , vol. 54, no. 3, pp. 341– 350, 2012. [10] P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, 2007. [11] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction, ” IEEE Trans. Acoust. , Speech, Signal Process. , vol. 27, no. 2, pp. 113– 120, 1979. [12] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise, ” in Proc. IEEE ICASSP 1979, Washington, DC, pp. 208– 211. [13] S. Kamath and P. Loizou, “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, ” in Proc. IEEE ICASSP, 2002, Orlando, Florida, vol. 4, pp. IV– 4164. [14] Y. Lu and P. C. Loizou, “ A geometric approach to spectral subtraction, ” Speech Commun. , vol. 50, no. 6, pp. 453– 466, 2008. 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 34/32
EE Dept. , IIT Bombay wsantosh@ee. iitb. ac. in [16] R. Martin, “Spectral subtraction based on minimum statistics, ” in Proc. Eur. Signal Process. Conf. , 1994, pp. 1182 -1185. [17] I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging, ” IEEE Trans. Speech Audio Process. , vol. 11, no. 5, pp. 466– 475, 2003. [18] H. Hirsch and C. Ehrlicher, “Noise estimation techniques for robust speech recognition, ” in Proc. IEEE ICASSP, 1995, Detroit, MI, pp. 153– 156. [19] V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and Wiener filtering, ” in Proc. IEEE ICASSP, 2000, Istanbul, Turkey, pp. 1875– 1878. [20] G. Doblinger, “Computationally efficient speech enhancement by spectral minima tracking in subbands, ” in Proc. 4 th Eur. Conf. Speech Commun. and Technology (EUROSPEECH’ 95), Madrid, Spain, 1995, pp. 1513– 1516. [21] L. Lin, W. H. Holmes, and E. Ambikairajah, "Adaptive noise estimation algorithm for speech enhancement, " Electronics Letters, vol. 39, no. 9, pp. 754 -755, 2003. [22] C. Ris and S. Dupont, “Assessing local noise level estimation methods: application to noise robust ASR, ” Speech Commun. , vol. 34, no. 1 -2, pp. 141– 158, 2001. [23] S. K. Basha and P. C. Pandey, “Real-time enhancement of electrolaryngeal speech by spectral subtraction, ” in Proc. Nat. Conf. on Commun. 2012 (NCC 2012), Kharagpur, India, 2012, pp. 516– 520. [24] S. K. Waddi, P. C. Pandey, and N. Tiwari, “Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners, ” in Proc. Nat. Conf. Commun. (NCC 2013), Delhi, India, 2013, paper no. 1569696063. [25] ITU, “Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, ” ITU-T Rec. , P. 862, 2001. [26] Y. Hu and P. C. Loizou, “Subjective evaluation and comparison of speech enhancement algorithms, ” Speech Communication, vol. 49, pp. 588– 601, 2007. [27] T. F. Quatieri, and A. V. Oppenheim, “Iterative techniques for minimum phase signal reconstruction from phase or magnitude, ” IEEE Trans. Acoust. , Speech, Signal Process. , vol. 29, no. 6, pp. 1187– 1193, 1981. [28] S. H. Nawab, T. F. Quatieri, and J. S. Lim, “Signal-reconstruction from short time Fourier 1. Intro. 2. Speech. IEEE Enhan. 3. Offline Inv. Speech 4. Real-time 5. Summ. 35/32 transform magnitude, ” Trans. Acoust. , Signal proc. Process. , vol. 31, no. 4, pp. 986–
EE Dept. , IIT Bombay wsantosh@ee. iitb. ac. in [29] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, New Jersey: Prentice Hall, 1978, pp. 356– 362. [30] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform, ” IEEE Trans. Acoust. , Speech, and Signal Process. , vol. 32, no. 2, pp. 236– 243, 1984. [31] Spectrum Digital, Inc. (2010) TMS 320 C 5515 e. Zdsp USB Stick Technical Reference. [online]. Available: support. spectrumdigital. com/boards/usbstk 5515/reva/files/usbstk 5515_Tech. Ref_Rev. A. pdf [32] Texas Instruments, Inc. (2011) TMS 320 C 5515 Fixed-Point Digital Signal Processor. [online]. Available: focus. ti. com/lit/ds/symlink/tms 320 c 5515. pdf. [33] Texas Instruments, Inc. (2008) TLV 320 AIC 3204 Ultra Low Power Stereo Audio Codec. [online]. Available: focus. ti. com/lit/ds/symlink/tlv 320 aic 3204. pdf. 1. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. 36/32
- Slides: 36