Speech Processing ShortTime Fourier Transform Analysis and Synthesis




![Fourier-Transform View u Recall (from Chapter 3): u w[n] is a finite-length, symmetrical sequence Fourier-Transform View u Recall (from Chapter 3): u w[n] is a finite-length, symmetrical sequence](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-5.jpg)
![Fourier-Transform View u x[n] – time-domain signal u fn[m]=x[m]w[n-m] - Denotes short-time section of Fourier-Transform View u x[n] – time-domain signal u fn[m]=x[m]w[n-m] - Denotes short-time section of](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-6.jpg)

![Fourier-Transform View a) Speech waveform x[n] (blue) Window function w[n] (red) b) Windowed section Fourier-Transform View a) Speech waveform x[n] (blue) Window function w[n] (red) b) Windowed section](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-8.jpg)
![Example 7. 1 u Let x[n] be a periodic impulse train sequence: … -P Example 7. 1 u Let x[n] be a periodic impulse train sequence: … -P](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-9.jpg)

![Example 7. 1 u Since windows w[n] do not overlap, |X(n, )| = constant Example 7. 1 u Since windows w[n] do not overlap, |X(n, )| = constant](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-11.jpg)

![Analysis window x[m] p=1 30 October 2020 L p=2 w[p. L-m] p=3 Veton Këpuska Analysis window x[m] p=1 30 October 2020 L p=2 w[p. L-m] p=3 Veton Këpuska](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-13.jpg)


![Filtering View u In this interpretation w[n] is considered to be a filter whose Filtering View u In this interpretation w[n] is considered to be a filter whose](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-16.jpg)
![Filtering View u The product: x[n]e-j on Modulation of x[n] up to frequency o. Filtering View u The product: x[n]e-j on Modulation of x[n] up to frequency o.](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-17.jpg)


![Filtering View u General Properties: 1. If x[n] has the length N & w[n] Filtering View u General Properties: 1. If x[n] has the length N & w[n]](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-20.jpg)












![Short-Time Synthesis n Note if there are frequency components of x[n] which do not Short-Time Synthesis n Note if there are frequency components of x[n] which do not](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-33.jpg)
![L > Nw x[m] L w[p. L-m] Nw 30 October 2020 Veton Këpuska 34 L > Nw x[m] L w[p. L-m] Nw 30 October 2020 Veton Këpuska 34](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-34.jpg)




![Filter Bank Summation (FBS) Method 1 u From Figure 7. 5 x[n] Analysis followed Filter Bank Summation (FBS) Method 1 u From Figure 7. 5 x[n] Analysis followed](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-39.jpg)

![Filter Bank Summation (FBS) Method u Thus: y[n] is the output of the convolution Filter Bank Summation (FBS) Method u Thus: y[n] is the output of the convolution](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-41.jpg)



![Generalized FBS Method u Note: u “Smoothing” function f[n, m] is referred to as Generalized FBS Method u Note: u “Smoothing” function f[n, m] is referred to as](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-45.jpg)


![Generalized FBS Method Interested in L>1 case and in using f[n] as interpolator. Interpolation Generalized FBS Method Interested in L>1 case and in using f[n] as interpolator. Interpolation](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-48.jpg)


![Overlap-Add (OLA) Method u Recall the short-time synthesis relation: u If x[n] is averaged Overlap-Add (OLA) Method u Recall the short-time synthesis relation: u If x[n] is averaged](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-51.jpg)







![Time-Frequency Sampling u Consider windowed/short-time signal: n n n fn[m]=w[m]x[n-m], and X(n, ) – Time-Frequency Sampling u Consider windowed/short-time signal: n n n fn[m]=w[m]x[n-m], and X(n, ) –](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-59.jpg)









![Short-Time Fourier Transform Magnitude (STFTM) u Furthermore, the autocorrelation r[n, m] is given by Short-Time Fourier Transform Magnitude (STFTM) u Furthermore, the autocorrelation r[n, m] is given by](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-69.jpg)

![Signal Representation u u u Suppose x[n] is the sum of two signals: x Signal Representation u u u Suppose x[n] is the sum of two signals: x](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-71.jpg)




![Signal Representation u Note that r[n, Nw-1], the maximum lag of autocorrelation, is given Signal Representation u Note that r[n, Nw-1], the maximum lag of autocorrelation, is given](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-76.jpg)

![Signal Representation u Sequential extrapolation algorithm 1. Initialize with x[0] 2. Update time n Signal Representation u Sequential extrapolation algorithm 1. Initialize with x[0] 2. Update time n](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-78.jpg)










![Example 7. 7 (cont. ) u Where is periodic extension of h[n], over N, Example 7. 7 (cont. ) u Where is periodic extension of h[n], over N,](https://slidetodoc.com/presentation_image/2b7871d4a2c6aaabb61354399e3706d1/image-89.jpg)








- Slides: 97
Speech Processing Short-Time Fourier Transform Analysis and Synthesis
Short-Time Fourier Transform Analysis and Synthesis: Minimum-Phase Synthesis u Speech & Audio Signals are varying and can be considered stochastic signals that carry information. u This necessitates short-time analysis since a single Fourier transform (FT) can not characterize changes in spectral content over time (i. e. , time-varying formants and harmonics) n n u In linear Prediction and Homomorphic Processing, underlying model of the source/filter is assumed. This leads to: n n u Discrete-time short-time Fourier transform (STFT) consists of separate FT of the signal in the neighborhood of that instant. FT in the STFT analysis is replaced by the discrete FT (DFT) Resulting STFT is discrete in both time and frequency. Discrete STFT vs. Discrete-time STFT which is continuous in frequency. Model based analysis/synthesis, also note that Analysis methods presented implicitly both used short time analysis methods (to be presented). In Short-Time Analysis systems no such restrictions apply. 30 October 2020 Veton Këpuska 2
Short-Time Analysis (STFT) u Two approaches of STFT are explored: 1. Fourier-transform & 2. Filterbank 30 October 2020 Veton Këpuska 3
Fourier Transform View Veton Këpuska
Fourier-Transform View u Recall (from Chapter 3): u w[n] is a finite-length, symmetrical sequence (i. e. , window) of length Nw. n w[n] ≠ 0 for [0, Nw-1] n w[n] – Analysis window or Analysis Filter 30 October 2020 Veton Këpuska 5
Fourier-Transform View u x[n] – time-domain signal u fn[m]=x[m]w[n-m] - Denotes short-time section of x[m] at point n. That is, signal at the frame n. u X(n, ) - Fourier transform of fn[m] of shorttime windowed signal data. u Computing the DFT: 30 October 2020 Veton Këpuska 6
Fourier-Transform View u Thus X(n, k) is STFT for every =(2 /N)k n Frequency sampling interval = (2 /N) n Frequency sampling factor = N u DFT: 30 October 2020 Veton Këpuska 7
Fourier-Transform View a) Speech waveform x[n] (blue) Window function w[n] (red) b) Windowed section of speech c) It’s Magnitude Spectrum. 30 October 2020 Veton Këpuska 8
Example 7. 1 u Let x[n] be a periodic impulse train sequence: … -P P 2 P 3 P n u Also let w[n] be a triangle of length P: -P/2 0 P/2+1 n P-points 30 October 2020 Veton Këpuska 9
Example 7. 1 Non-zero only for m=l. P Window located at l. P & Linear phase - l. P 30 October 2020 Veton Këpuska 10
Example 7. 1 u Since windows w[n] do not overlap, |X(n, )| = constant and ∠X(n, ) is linear. u Computation of DFT for N=P gives: 1 DFT of translated, non-overlapping windows with phase shift of zero (due to sampling) 30 October 2020 Veton Këpuska 11
Spectogram |X(n, )|2 u If analysis window length is ≤ pitch period ⇒ wideband spectrogram ⇒ vertical striations u Otherwise ⇒ narrowband spectrogram ⇒ horizontal striations u How often to apply analysis window to the signal? n X(n, k) is decimated by a temporal decimation factor L: u X(n. L, k) = DFT{fn. L(m)} u fn. L[m] sections are a subset of fn[m] n How to chose sampling rates in time (L) and frequency (N-fft length) it will be addressed in one of the forthcoming sections. 30 October 2020 Veton Këpuska 12
Analysis window x[m] p=1 30 October 2020 L p=2 w[p. L-m] p=3 Veton Këpuska 13
Spectrogram |X(n, )|2 30 October 2020 Veton Këpuska 14
Fourier-Transform View u Note that in , X(n, ) is periodic over 2 (same as Fourier transform) and is Hermetian (H=H’) symmetric. n For real sequences u Re{X(n, )} or |X(n, )| is symmetric u Im{X(n, )} or arg{X(n, )} is anti-symmetric u A time-shift results in linear phase shift (same as in Fourier Transform): u Thus, a shift by n 0 in the original time sequence introduces a linear phase, but also a shift in time, corresponding to a shift in each short-time section by n 0. 30 October 2020 Veton Këpuska 15
Filtering View u In this interpretation w[n] is considered to be a filter whose impulse response is w[n]. u Thus w[n] is referred to as analysis filter. u Let’s fix the value of = o. u The above equation represents the convolution of the sequence x[n]e-j on with the sequence w[n]. Thus: 30 October 2020 Veton Këpuska 16
Filtering View u The product: x[n]e-j on Modulation of x[n] up to frequency o. 30 October 2020 Veton Këpuska 17
Filtering View u Alternate view: u The discrete STFT can be also interpreted from the filtering viewpoint. u This equation brings the interpretation of the discrete STFT as the output of the filter bank shown in the next slide. 30 October 2020 Veton Këpuska 18
Filtering View 30 October 2020 Veton Këpuska 19
Filtering View u General Properties: 1. If x[n] has the length N & w[n] has the length M, then X(n, ) has length N+M+1 along n. 2. The bandwidth of X(n, o) is less than or equal to that of w[n]. 3. Sequence X(n, o) has its spectrum centered at the origin. 30 October 2020 Veton Këpuska 20
Example 7. 2 u Consider a Gaussian window of the form: u The discrete STFT with DFT length N, therefore, can be considered as a bank of filters with impulse responses: u For x[n]= (n) x[n]*hk[n]=hk[n] u If N=50, corresponding to bandpass filters spaced by 200 Hz for the sampling rate of 10000 samples/s, then: 30 October 2020 Veton Këpuska 21
Example 7. 2 u For k=0, 5, 10, 15 the following is obtained: 30 October 2020 Veton Këpuska 22
Example 7. 2 30 October 2020 Veton Këpuska 23
Example 7. 3 u Consider the filter bank of previous example 7. 2 that was designed with a Gaussian window of the form: u Figure 7. 7 shows the Fourier transform magnitudes of the output of the four complex bandpass filters hk[n] for k=0, 5, 10, and 15 as presented in previous slide and depicted in the figure 7. 6. 30 October 2020 Veton Këpuska 24
Example 7. 3 u After Demodulation the resulting bandpass outputs have the same spectral shape as in the figure but centered at the origin. 30 October 2020 Veton Këpuska 25
Time-Frequency Resolution Tradeoffs u In Chapter 3 basic issue in analysis window selection is the compromise required between a long window for showing signal detail in frequency and a short window required for representing fine temporal structure: n n Since both X( ) and W( ) are periodic over 2 linear convolution is essentially circular. From the equation above: u W( ) smears (smoothes) X( ). Want W( ) as narrow as possible ideally W( )= ( ) for good frequency resolution. W( )= ( ) will result in a infinitely long w[n]. Poor time resolution. Conflicting goal 30 October 2020 Veton Këpuska 26
Example 7. 4 u Figure 7. 8 depicts time-frequency resolution tradeoff: 30 October 2020 Veton Këpuska 27
Time-Frequency Resolution Tradeoffs u From the previous example, smoothing interpretation of STFT is not valid for non-stationary sequences. u For steady signal long analysis windows are appropriate and they yield good frequency resolution as depicted in the next figure. 30 October 2020 Veton Këpuska 28
Time-Frequency Resolution Tradeoffs u However, for short and transient signals, plosive speech, flaps, diphthongs, etc. , short windows are preferred in order to capture temporal events. u Shorter windows yield poor frequency resolution. 30 October 2020 Veton Këpuska 29
Short-Time Synthesis u How to obtain original sequence back from its discrete-time STFT? u The inversion is represented mathematically by a synthesis equation which expresses a sequence in terms of its discrete-time STFT. u Recall that for fn[m]=x[m]w[n-m]: u Thus: If w[n]≠ 0 then recovery is complete. 30 October 2020 Veton Këpuska 30
Short-Time Synthesis u For each n, we take the inverse Fourier transform of the corresponding function of frequency, then we obtain the sequence fn[m]. u Evaluating fn[m] for m=n the following is obtained: n x[n]w[0]. n For w[0]≠ 0 x[n] can be obtained by dividing fn[n]/w[0]. u The process of taking the inverse Fourier transform of X(n, ) for a specific n and then dividing by w[0] is represented in the following relation: representing synthesis equation for the discrete-time STFT. 30 October 2020 Veton Këpuska 31
Short-Time Synthesis u In contrast to discrete-time STFT X(n, ) the discrete STFT X(n, k) is not always invertible. u Example 1. n Consider the case when w[n] is bandlimited with bandwidth of B. 30 October 2020 Veton Këpuska 32
Short-Time Synthesis n Note if there are frequency components of x[n] which do not pass through any of the filter regions of the discrete STFT then it is not a unique representation of x[n], and x[n] is not invertible. u Example 2. n Consider X(n, k) decimated in time by factor L, i. e. , STFT is applied every L samples. n w[n] is non-zero over its length Nw. u If L > Nw then there are gaps in time where x[n] is not represented/considered. u Thus in such cases again x[n] is not invertible. 30 October 2020 Veton Këpuska 33
L > Nw x[m] L w[p. L-m] Nw 30 October 2020 Veton Këpuska 34
Short-Time Synthesis u Conclusion: n Constraints must be adopted to ensure uniqueness and invertability: 1. Proper/Adequate frequency sampling: B≥ 2 /Nw (B - Window bandwidth) 2. Proper Temporal Decimation: L≤Nw 30 October 2020 Veton Këpuska 35
Filter Bank Summation Method Veton Këpuska
Filter Bank Summation (FBS) Method u Traditional short-time synthesis method that is commonly referred to as the Filter Bank Summation (FBS). u FBS is best described in terms of the filtering interpretation of the discrete STFT. n The discrete STFT is considered to be the set of outputs of a bank of filters. n The output of each filter is modulated with a complex exponential n Modulated filter outputs are summed at each instant of time to obtain the corresponding time sample of the original sequence (see Figure 7. 5(b) in the slide 19). 30 October 2020 Veton Këpuska 37
Filter Bank Summation (FBS) Method u Recall the synthesis equation given earlier: u FBS method carries out discrete version of this equation by utilizing discrete STFT X(n, k): u Derive conditions such that to ensure that y[n] x[n]? 30 October 2020 Veton Këpuska 38
Filter Bank Summation (FBS) Method 1 u From Figure 7. 5 x[n] Analysis followed by synthesis y[n] u Thus: Interchanging summation operation this equation reduces to: 30 October 2020 Veton Këpuska 39
Filter Bank Summation (FBS) Method u Furthermore 30 October 2020 Veton Këpuska 40
Filter Bank Summation (FBS) Method u Thus: y[n] is the output of the convolution of x[n] with a product of the analysis window with a periodic impulse sequence. u Note: reduces to w[n] = w[0] if: n Window length Nw≤N, or n For Nw>N, must have w[r. N]=0 for r≠ 0, that is 30 October 2020 Veton Këpuska 41
Filter Bank Summation (FBS) Method 30 October 2020 Veton Këpuska 42
Filter Bank Summation (FBS) Method u This constraint is known as the FBS constraint. u It must be fulfilled in order to ensure exact signal synthesis with the FBS method. u This constrained is commonly expressed in frequency domain: u This expression states that the frequency responses of the analysis filters should sum to a constant across the entire bandwidth. u We will conclude this discussion by stating that a filter bank with N filters, based on an analysis filter of length less than or equal to N, is always an all-pass system. 30 October 2020 Veton Këpuska 43
Generalized FBS Method Veton Këpuska
Generalized FBS Method u Note: u “Smoothing” function f[n, m] is referred to as the timevarying synthesis filter. u It can be shown that any f[n, m] that fulfills the condition below makes the synthesis equation above valid (Exercise 7. 6): u Note also that basic FBS method can be obtained by setting the synthesis filter to be a non-smoothing filter: f[n, m]= [m] 30 October 2020 Veton Këpuska 45
Generalized FBS Method u Consider the discrete STFT with decimation factor L. Generalized FSB of the synthesized signal is given by: u Furthermore, consider time invariant smoothing filter: f[n, m]=f[m] u That is: f[n, n-r. L]=f[n-r. L] 30 October 2020 Veton Këpuska 46
Generalized FBS Method u Thus u This equation holds when the following constrain is satisfied by the analysis and synthesis filters as well as the temporal decimation and frequency sampling factors: u For f[m]= [m] and L=1 this method reduces to the basic FBS method. 30 October 2020 Veton Këpuska 47
Generalized FBS Method Interested in L>1 case and in using f[n] as interpolator. Interpolation FBS Methods: u 1. Helical Interpolation (Partnoff) 2. Weighted Overlap-add Method (Croshiere) 30 October 2020 Veton Këpuska 48
Overlap-Add (OLA) Method Veton Këpuska
Overlap-Add (OLA) Method u u FBS Method was motivated from the filtering view of the STFT OLA method was motivated from the Fourier transform view of the STFT. u In the OLA method: u This works provided that analysis window is designed such that the overlap and add operation effectively eliminates the analysis window from the synthesized sequence. Basic idea is that the redundancy within overlapping segments and the averaging of the redundant samples remove the effect of windowing. u 1. 2. Inverse DFT for each fixed time in the discrete STFT is taken, Overlap and add operation between the short-time section is performed, 30 October 2020 Veton Këpuska 50
Overlap-Add (OLA) Method u Recall the short-time synthesis relation: u If x[n] is averaged over many short-time segments and normalized by W(0) then where 30 October 2020 Veton Këpuska 51
Overlap-Add (OLA) Method u Discretized version of OLA is given by: u Note that the above IDFT is true provided that N>Nw. The expression for y[n] thus becomes: u Which provided that: then y[n]=x[n] 30 October 2020 Veton Këpuska Always True because sum of values of a sequence must always equal the first value of its Fourier transform (D. C. Energy of a signal is by definition sum of signal values) 52
Overlap-Add (OLA) Method u For decimation in time by factor of L, it can be shown (Exercise 7. 4) that: u Then x[n] can be synthesized using the following equation: u The above equation depicts general constrain imposed by OLA method. It requires that the sum of all the analysis windows (obtained by sliding w[n] with L-point increments) to add up to a constant as shown in the next figure. 30 October 2020 Veton Këpuska 53
Overlap-Add (OLA) Method 30 October 2020 Veton Këpuska 54
Overlap-Add (OLA) Method 30 October 2020 Veton Këpuska 55
Overlap-Add (OLA) Method u Duality of OLA constraint and FBS constraint: FBS u u OLA FBS method requires that finite-length windows have a length N w less than the number of analysis filters N to satisfy FBS constrain (N>N w). Analogously, for OLA methods it can be shown that its constrained is satisfied by all-finite-bandwidth analysis windows whose maximum frequency is less than 2 /L (where L is temporal decimation factor). n In addition this finite-bandwidth constraint can be relaxed by allowing the shifted window transform replicas to take on value zero at the frequency origin =0: n Analogous to FBS constrain for Nw>N where the window w[n] is required to take on value zero at n= N, 2 N, 3 N, . . . 30 October 2020 Veton Këpuska 56
Overlap-Add (OLA) Method 30 October 2020 Veton Këpuska 57
Time-Frequency Sampling u Different qualitative view of the time-frequency sampling concepts for OLA and FBS constrains from the perspective of classical time-domain and frequency -domain aliasing. u Following discussion serves as additional summary of sampling issues for those two methods that gives motivation for our earlier statement that sufficient but not necessary conditions for invertability of the discrete STFT are: 1. 2. 3. 30 October 2020 The analysis window is non-zero over its finite length N w. The temporal decimation factor L≤Nw The frequency sampling interval 2 /N ≤ 2 /Nw Veton Këpuska 58
Time-Frequency Sampling u Consider windowed/short-time signal: n n n fn[m]=w[m]x[n-m], and X(n, ) – Fourier transform of fn[m] Analysis window duration of Nw u From Fourier transform point of view: n Reconstruction of fn[m] from X(n, k) requires a frequency sampling of at least 2 /Nw or finer. u From Time-domain point of view: n Time decimation interval L is required to meet Nyquist criterion based on the bandwidth of the window w[n]. u This implies sampling of X(n, k) at a time interval L ≤ 2 / c to avoid frequency-domain aliasing of the time sequence X(n, ) - c c u c is the bandwidth of W( ) [- c, c] 30 October 2020 Veton Këpuska 59
Time-Frequency Sampling 30 October 2020 Veton Këpuska 60
Time-Frequency Sampling Sufficient (but not necessary) conditions for signal reconstruction are: u 1. 2. 3. u Window is non-zero over its lengths Nw Temporal decimation factor L ≤ Nw (2 / c) Frequency sampling interval 2 /N ≤ 2 /Nw To avoid aliasing: I. II. In the time domain - by ensuring condition 2. In the frequency domain - by ensuring condition 3. 30 October 2020 Veton Këpuska 61
Time Decimation Sampling u Implication on the use of practical windows: I. Rectangular window, Nw Assuming bandwidth equal to the extent of the main lobe B = [-2 /Nw, : 2 /Nw]= 4 /Nw - c c ; 50% Overlap in windows II. Hamming Window, Nw Bandwidth B = 8 /Nw 30 October 2020 ; 75% Overlap in windows Veton Këpuska 62
Summary u OLA Method (DFT of order N) 1. No time aliasing if window length Nw so that: 2 /N ≤ 2 /Nw 2. No frequency-domain aliasing occurs if decimation factor L is small enough so that filter bandwidth c =(2 /L) 3. If zeros are allowed in W( ) then condition 2 can be relaxed. In this case we can under-sample in frequency and still recover the sequence. 30 October 2020 Veton Këpuska 63
Summary u FBS Method 1. No frequency-domain aliasing occurs if the decimation factor L meets the Nyquist criterion, i. e. , L ≤ Nw (2 / c) where c is the w[n] bandwidth. 2. Not time-domain aliasing occurs if 2 /N ≤ 2 /Nw Nw≤ N. 3. If zeros in w[n] are allowed then condition 2 can be relaxed. In this case we can under-sample in time and still recover the sequence. 30 October 2020 Veton Këpuska 64
Short-Time Fourier Transform Magnitude (STFTM) Veton Këpuska
Short-Time Fourier Transform Magnitude (STFTM) u Spectrogram is major tool in speech applications: u Spectrogram is squared STFT magnitude (STFTM). n It has been suggested that human ear extracts perceptual information strictly form a spectrogramlike-representation of speech ( J. C. Anderson, “Speech Analysis/Synthesis Based on Perception”, Ph. D Thesis, MIT, 1984) n Experienced speech researchers have trained themselves to “read” the spectrogram itself (Victor Zue, MIT). u Primary topic of FIT-ece 5528 – “Acoustics of American Speech”. 30 October 2020 Veton Këpuska 66
Short-Time Fourier Transform Magnitude (STFTM) u STFTM discards (possibly) phase information, which has numerous uses in varous application areas such as: n n Time-scale modification Speech Enhancement u In all these applications phase information estimation of speech is difficult (e. g. , presence of noise in the signal) u Furthermore, a number of techniques have been developed to obtain phase estimate from a STFT magnitude. u This section introduces STFTM as an alternative timefrequency signal representation. u In addition, analysis and synthesis techniques will be developed for STFTM. 30 October 2020 Veton Këpuska 67
Short-Time Fourier Transform Magnitude (STFTM) u Squared-Magnitude and Autocorrelation Relationship: Short-time autocorrelation Short-time magnitude n m-autocorrelation “lag” 30 October 2020 Veton Këpuska 68
Short-Time Fourier Transform Magnitude (STFTM) u Furthermore, the autocorrelation r[n, m] is given by the convolution of the short-time signal: r[n, m] = fn[m]*fn[-m] where fn[m]=x[m]w[n-m] 30 October 2020 Veton Këpuska 69
Signal Representation u Under what conditions STFTM can be used to represent a sequence uniquely? u Note that: |F{x[n]}|= |F{-x[n]}| ⇒ Ambiguity, thus STFTM is not unique representation for all cases. u However, by imposing certain mild restrictions on: n the analysis window and n the signal, unique signal representation is indeed possible with the discrete-time STFTM. 30 October 2020 Veton Këpuska 70
Signal Representation u u u Suppose x[n] is the sum of two signals: x 1[n] and x 2[n] occupying different regions of the n-axis. Furthermore, suppose that the gap of zeros between x 1[n] and x 2[n] is large enough so that there is no analysis window position for which the corresponding short-time section includes non-zero samples of both x 1[n] and x 2[n]. Because of the ambiguity condition STFTM of: n x 1[n] + x 2[n] n x 1[n] - x 2[n], and n -x 1[n] + x 2[n] is the same. 30 October 2020 Veton Këpuska 71
Signal Representation u u Any uniqueness conditions must include a restriction on the length of zero gaps between nonzero portions of the signal x[n]. Sufficient uniqueness conditions are the following: 1. The analysis window w[n] is known sequence of finite length Nw, with no zeros over its durations. 2. The sequence x[n] is one-sided with at most Nw-2 consecutive zero samples, and the sign of its first non-zero value is known. 30 October 2020 Veton Këpuska 72
Signal Representation u If the successive STFTM correspond to overlapping signal segments then: n If short-time spectral magnitude of signal segment at time n is know then n Spectral magnitude of the adjacent section at time n+1 must be consistent in the region of overlap with the known short-time section. ⇒ If the analysis window were non-zero and of length Nw, then after dividing out the analysis window, the first Nw-1 samples of the segment at time n+1, must equal the last Nw-1 of the segment at time n (as illustrated in the next slide) ⇒ If the last sample of a segment can be extrapolated from its first Nw-1 values, one could repeat this process to obtain the entire signal x[n]. 30 October 2020 Veton Këpuska 73
Signal Representation 30 October 2020 Veton Këpuska 74
Signal Representation u To develop the procedure for extrapolating the next sample of a sequence using its STFTM, assume that the first Nw-1 samples under the analysis window positioned at time n are known. n The sequence x[n] has been obtained up to some time n-1 from its STFTM. u Goal is to compute sample x[n] from these initial samples and the STFT magnitude, |X(n, )|, or equivalently r[n, m]. 30 October 2020 Veton Këpuska 75
Signal Representation u Note that r[n, Nw-1], the maximum lag of autocorrelation, is given by the product of the first and last value of the segment: ⇒ 30 October 2020 Veton Këpuska 76
Signal Representation u Note that: u If the first value of the short-time section, x[n-(Nw-1)] happens to be equal to zero, must find the first non-zero value within the section and again use the product relation as depicted in the last expression. u Note that such a sample can be found because it was assumed that there at most Nw-2 consecutive zero samples between any two nonzero samples of x[n]. 30 October 2020 Veton Këpuska 77
Signal Representation u Sequential extrapolation algorithm 1. Initialize with x[0] 2. Update time n 3. Compute r[n, Nw-1] from the inverse DFT of |X(n, k)|2. 4. Compute: 5. Return to step (2) and repeat 30 October 2020 Veton Këpuska 78
Reconstruction from Time. Frequency Samples u To carry out STFTM analysis on a digital computer, discrete STFTM must be applied. u Uniqueness theory of STFTM can be easily extended to discrete STFTM. n Uniqueness of STFTM based on the short-time autocorrelation functions. n Autocorrelation functions can be obtained even if the STFTM is sampled in frequency (discrete STFTM) with adequate frequency sampling. u To consider effects of temporal decimation with factor L, we note that adjacent short-time sections now have an overlap of Nw-L instead of Nw-1. 30 October 2020 Veton Këpuska 79
Reconstruction from Time. Frequency Samples u Sufficient uniqueness conditions for the partial overlap case: 1. The analysis window w[n] is a known sequence of finite length Nw, with no zeros over its duration. 2. The sequence x[n] is one-sided with, at most Nw-2 L consecutive zero samples. L consecutive samples of x[n] (from the first nonzero sample) are known. This is a sufficient but not a necessary condition. 30 October 2020 Veton Këpuska 80
Signal Estimation from the Modified STFT or STFTM u u Synthesis of a signal from a time-frequency function of a modified STFT or STFTM required in many applications. Modification may arise due to: 1. 2. 3. 4. n Quantization errors (e. g. , from speech coding) Time-varying filtering Speech Enhancement Signal Rate modifications Limitations: n n Modifications in frequency should result in time modification that are restricted within an analysis window (Figure 7. 18 next slide) Overlapping sections must undergo similar modifications (Figure 7. 19) 30 October 2020 Veton Këpuska 81
Signal Estimation from the Modified STFT or STFTM u Example 7. 5. Removal of interfering tone. n Consider modifying a valid n n n X(n, ) of short time fn[m]=x[m]w[n-m] segment by inserting a zero gap where there is known to lie an unwanted interfering sine wave component. Removal of the interfering signal with H(n, ). Resulting frequency representation is: Y(n, )=X(n, )H(n, ) Inverse transforming it to obtain modified short-time sequence gn[m] is non-zero beyond the extent of the original short-time segment fn[m]=x[m]w[n-m]. 30 October 2020 Veton Këpuska 82
Signal Estimation from the Modified STFT or STFTM u Example 7. 6 n n At time n: u Suppose a time-decimated STFT, X(n. L, ) is multiplied by a linear phase factor ej no to obtain Y(n. L, )=X(n. L, )ej no At time (n+1) u X((n+1)L, ) is multiplied by a negative of this linear phase factor e-j no to obtain Y((n+1)L, )=X((n+1)L, )e -j no n Overlapping sections of inverse Fourier Transforms denoted by gn. L[m] and g(n+1)L[m] are not consistent. 30 October 2020 Veton Këpuska 83
Heuristic Application of STFT Synthesis Methods u Although modifications of the STFT or STFTM may violate some principles, results may be ”reasonable”. u Resulting effect of modifying STFT (FBS and OLA) with another time-frequency function can be shown to be a time-varying convolution between x[n] and a function ĥ[n, m]: x[n]*ĥ[n, m]. u Let X(n, ) be modified by a function H(n, ): Y(n, ) = X(n, )H(n, ) u This corresponds to a new short-time segment: gn[m] = fn[n]*h[n, m] u h[n, m] – time varying system impulse response (Chapter 2). 30 October 2020 Veton Këpuska 84
Heuristic Application of STFT Synthesis Methods u Consider FBS method (discretization in frequency to obtain): u N-point IDFT of H(n, k): u Then resulting sequence can be written as: where 30 October 2020 Veton Këpuska 85
Heuristic Application of STFT Synthesis Methods u Using OLA method, it can be shown (see Exercise 7. 11) that: u Contrasting FBS with OLA n FBS: n OLA: 30 October 2020 multiplication instantaneous change convolution smoothing Veton Këpuska 86
Heuristic Application of STFT Synthesis Methods u Example 7. 7 n Suppose we want to deliberately introduce reverberation into a signal x[n] by convolution with the filter: h[n] = [n] + [n-no] n Fourier transform of which is: H( ) = 1 + e-j no n STFT of resulting signal is given by: Y(n, )= X(n, )H( ) where 30 October 2020 Veton Këpuska 87
Example 7. 7 (cont. ) u Using OLA method (7. 21): u It is then possible to express y[n] in terms of original sequence: 30 October 2020 Veton Këpuska 88
Example 7. 7 (cont. ) u Where is periodic extension of h[n], over N, of which we only consider interval [0, N-1]. u This implies that original reverberated signal is obtained only when no<N, otherwise temporal alias will occur (as illustrated in 7. 20). 30 October 2020 Veton Këpuska 89
Example 7. 7 (cont. ) 30 October 2020 Veton Këpuska 90
Time-Scale Modification and Enhancement of Speech Veton Këpuska
Time-Scale Modification and Enhancement of Speech u The signal construction methods presented in this chapter can be applied in a variety of speech applications. u Time-Scale Modification n In speech case would like to change articulation rate (faster, slower) without changing the pitch 30 October 2020 Veton Këpuska 92
Time-Scale Modification 30 October 2020 Veton Këpuska 93
Time-Scale Modification u Methods: n Cut & Paste (Fairbanks method): u Discard or duplicate frames, in order to speed up or slow down the u articulation respectively. Problem: n n Pitch-synchronous OLA (Scott & Gerber) u Select frame size & location synchronous to pitch periods. Problem of u pitch period mismatch is avoided. Problem: n n Pitch period mismatch at adjacent frames causes distortion. Pitch synchronization is not always easy. STFTM Synthesis u To avoid pitch synchronization problems use only the magnitude of STFT 1. 2. 3. u (i. e. , STFTM) Compute |X(n. L, )| at an appropriate frame interval – decimation rate L (e. g. , L=128 at Fs=10000 Hz, and N is several T 0 long) Modify decimation rate with new rate M (e. g. , M=L/2) for a speed-up of factor of ½: |Y(n. M, )|= |X(n. L, )| Apply the Least-Squared Error iterative estimation algorithm until |Y(n. M, )| converged. Problem: n 30 October 2020 Occasional reverberant characteristic of synthesized signal are perceived due to lack of STFT phase control. Veton Këpuska 94
Time-Scale Modification 30 October 2020 Veton Këpuska 95
Noise Reduction u A number of techniques developed to remove/reduce additive noise: u Noise corrupted signal is given by: y[n]=x[n]+b[n] n STFT Synthesis: u Subtract Noise spectrum Ŝb( ) u Original phase spectrum Y(n. L, ) is retained because phase of the noise can not be reliably estimated in general. u Factor is a control of the degree of noise reduction. 30 October 2020 Veton Këpuska 96
Noise Reduction n STFTM Synthesis: u Ignore phase and use Sequential Extrapolation or Least-Squared Error estimation method to construct clean signal. 30 October 2020 Veton Këpuska 97