Microcomputer Systems 2 Time Stretching Pitch Shifting of
- Slides: 69
Microcomputer Systems 2 Time Stretching & Pitch Shifting of Audio Signals Veton Këpuska
Time Stretching & Pitch Shifting of Audio Signals Outline Introduction Techniques Used for Time Compression/Expansion and Pitch Shifting Comparison Timbre and Formants Veton Këpuska
Outline Introduction u Frequency Shift vs. Pitch Shift – Audio Examples Time Compression/Expansion n n Techniques Used for Time Compression/Expansion and Pitch Shifting u The Phase Vocoder n u u Related Topics Why Phase Time Domain Harmonic Scaling (TDHS) More recent approaches n n Comparison u Which Method to Use Pitch Shifting Considerations Audio Examples n n n Timbre and Formants u n n 11/25/2020 Phase Vocoder and Formants Time Domain Harmronic scaling and Formants Veton Këpuska 3
Introduction u Time Stretching & Pitch Shifting n n Are two dominant techniques that used for speech and sound manipulation. Typical applications entail: u u 11/25/2020 Changing the speed of play-back (altering the length of the signal) without altering the pitch of the voice and/or instruments Changing the pitch of the voice and/or instruments without changing the length of the signal. Veton Këpuska 4
Pitch Shifting Veton Këpuska
Pitch Shifting: u u As opposed to the process of pitch transposition achieved using (a simple) sample rate conversion, Pitch Shifting is a way to change the pitch of a signal without changing its length. In practical applications, this is usually achieved by changing the length of a sound using one of the methods discussed next and then performing a sample rate conversion to change the pitch. 11/25/2020 Veton Këpuska 6
Introduction u Pitch Shifting is NOT Frequency Shifting: n There exists a certain confusion in terminology in the literature, as Pitch Shifting is often also incorrectly named 'Frequency Shifting'. u u n A true Frequency Shift (as obtainable by modulating an analytic signal by a complex exponential) will shift the spectrum of a sound, while Pitch Shifting will dilate it, upholding the harmonic relationship of the sound. Frequency Shifting yields a metallic, inharmonic sound which may well be an interesting special effect but which is a totally inadequate process for changing the pitch of any harmonic sound except a single sine wave. 11/25/2020 Veton Këpuska 7
Audio Examples of Pitch Shifting vs. Frequency Shifting u Original Sound: u Pitch Shifted: u Frequency Shifted: 11/25/2020 Veton Këpuska 8
Time Compression/Expansion Veton Këpuska
Time Compression/Expansion u Time Compression/Expansion, also known as "Time Stretching" is the reciprocal process to Pitch Shifting. n n It leaves the pitch of the signal intact while changing its speed (tempo). This is a useful application when you wish to change the speed of a voiceover without messing with the timbre of the voice. 11/25/2020 Veton Këpuska 10
Time Compression/Expansion u There are several fairly good methods to do time compression/expansion and pitch shifting but most of them will not perform well on all different kinds of signals and for any desired amount of shift/stretch ratio. u Typically, good algorithms allow pitch shifting up to 5 semitones on average or stretching the length by 130%. u When time stretching and pitch shifting single instrument recordings you might even be able to achieve a 200% time stretch, or a one-octave pitch shift with no audible loss in quality. 11/25/2020 Veton Këpuska 11
Time Compression/Expansion of Speech u Typical Goals n u To either speed up or slow down a speech signal while maintaining the approximate pitch Applications n n Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc… 11/25/2020 Veton Këpuska 12
Techniques Used for Time Compression/Expansion & Pitch Shifting u Option 1 – Change sample rate n u If you modify the sample rate, you can change the speed but the pitch is also changed u Increase sample rate = higher pitch (chipmunk sound) u Decrease sample rate = lower pitch (drawn out echo sound) Option 2 – Decimate or Interpolate Signal n If you change the number of samples, the result is the same as modifying the sample rate 11/25/2020 Veton Këpuska 13
Techniques Used for Time Compression/Expansion & Pitch Shifting u Option 3 – Use more complex methods n This will change the speed of the sample while preserving the pitch data u u 11/25/2020 Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis Veton Këpuska 14
Techniques Used for Time Compression/Expansion & Pitch Shifting u Currently, there are two different principal time compression/expansion and pitch shifting schemes employed in most of today's applications: n n Phase Vocoder. Time Domain Harmonic Scaling (TDHS). 11/25/2020 Veton Këpuska 15
Phase Vocoder Veton Këpuska
Phase Vocoder u Phase Vocoder. This method was introduced by Flanagan and Golden in 1966 and digitally implemented by Portnoff ten years later. n n Portnoff, M. R. 1981 a. "Short-Time Fourier Analysis of Sampled Speech. " IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(3): 364 -373. Portnoff, M. R. 1981 b. "Time-Scale Modification of Speech Based on Short. Time Fourier Analysis. " IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(3): 374 -390. 11/25/2020 Veton Këpuska 17
Phase Vocoder u u It uses a Short Time Fourier Transform (use abbreviation STFT from here on) to convert the audio signal to the complex Fourier representation. Since the STFT returns the frequency domain representation of the signal at a fixed frequency grid, the actual frequencies of the partial bins have to be found by converting the relative phase change between two STFT outputs to actual frequency changes. Note the term 'partial' has nothing to do with the signal harmonics. In fact, a STFT will never readily give you any information about true harmonics if you are not matching the STFT length to the fundamental frequency of the signal – and even then is the frequency domain resolution quite different to what our ear and auditory system perceives. The timebase of the signal is changed by calculating the frequency changes in the Fourier domain on a different time basis, and then an i. STFT is done to regain the time domain representation of the signal. 11/25/2020 Veton Këpuska 18
Phase Vocoder u Phase vocoder algorithms are used mainly in scientific and educational software products (to show the use and limitations of the Fourier Transform) but have gained in popularity over the past few years due to improvements that made it possible to greatly reduce the artifacts of the "original" phase vocoder algorithm. u The basic phase vocoder suffers from a severe drawback because it introduces a considerable amount of artifacts audible as 'smearing' and 'reverberation' (even at low expansion ratios) due to the “non-synchronized vertical coherence of the sine and cosine basis functions” that are used to change the timebase. 11/25/2020 Veton Këpuska 19
Phase Vocoder u u Puckette, Laroche and Dolson have shown that the phasiness can be greatly reduced by picking peaks in the Fourier spectrum and keeping the relative phases around the peaks unchanged. Even though this improves the quality considerably it still renders the result somewhat phasey and diffuse when compared to time domain methods. Current research focuses on improving the phase vocoder by applying intra-frame sinusoidal sweep and ramp rate correction (Bristow-Johnson and Bogdanowicz) and multi-resolution phase vocoder concepts (Bonada). 11/25/2020 Veton Këpuska 20
Links to Publicly Available Vocoders u Pointers - Phase Vocoder: n n n n The MIT Lab Phase Vocoder Wave. Masher - GPL/Open Source Phase Vocoder by Kenneth Sturgis Sculptor: A Real Time Phase Vocoder by Nick Bailey A Phase Vocoder implementation using Matlab More reading on the Phase Vocoder The IRCAM "Super Phase Vocoder“ S. M. Bernsee's Pitch Shifting Using The Fourier Transform article (with C code) 11/25/2020 Veton Këpuska 21
Time Domain Harmonic Scaling (TDHS). Veton Këpuska
Time Domain Harmonic Scaling (TDHS). u Time Domain Harmonic Scaling (TDHS). This is based on a method proposed by Rabiner and Schafer in 1978. It is heavily based on a correct estimate of the fundamental frequency of the sound processed. 11/25/2020 Veton Këpuska 23
Theory u Short Time Fourier Transform Methods n n n Chapter 7 in our text (Discrete-Time Speech Signal Processing) Refer to notes from in class for mathematical theory of operation I will pick up from where Dr. Kepuska stopped in his notes 11/25/2020 Veton Këpuska 24
How is the Speech/Sound Signal Processed u Link: n Ch 7 -Short. Time_Fourier_Transform_Analysis_and_ Synthesis. ppt 11/25/2020 Veton Këpuska 25
Terminology & Basic Idea Frame Rate Window Size 11/25/2020 Veton Këpuska 26
Short Time Fourier Transform u Short Time Fourier Transform n n Also called the Fairbanks method Extract successive short-time segments and then discard the following ones Signal STFT Decimate Samples IFFT OLA 11/25/2020 Veton Këpuska Output 27
Short Time Fourier Transform u Frame Rate factor L n In frequency domain after taking the STFT, you get u n X(n. L, ω) Form a new signal by u Y(n. L, ω) = X(sn. L, ω) n u u where s = compression factor Take Inverse Fourier Transform Use Overlap and Add method to form new signal 11/25/2020 Veton Këpuska 28
Short Time Fourier Transform X(n. L, ω) Y(n. L, ω) = X(2 n. L, ω) 11/25/2020 Veton Këpuska 29
Short Time Fourier Transform New Sequence Original Windowed Sequence 11/25/2020 Veton Këpuska 30
Short Time Fourier Transform u Problems n Pitch Synchronization u It is highly likely that the pitch periods will not line up properly 11/25/2020 Veton Këpuska 31
Short Time Fourier Transform Magnitude u Short Time Fourier Transform Magnitude n Problems with STFT method relate directly to the linear phase component of the STFT u n Time shift = phase change Alternate approach is to only use the magnitude portion of the STFT—Short Time Fourier Transform Magnitude 11/25/2020 Veton Këpuska 32
Short Time Fourier Transform Magnitude u Compression n With the Fairbanks method, time slices were discarded Now we can just compress the time slices Form a new signal by u |Y(n. M, ω)| = |X(n. L, ω)| where n n 11/25/2020 M = compression factor = L / speed i. e. for speeding up by two => M = L/2 Veton Këpuska 33
Short Time Fourier Transform Magnitude u Compression n n Take Inverse Fourier Transform Use Overlap and Add method to form new signal 11/25/2020 Veton Këpuska 34
Short Time Fourier Transform Magnitude X(n. L, ω) Y(n. M, ω) = X(n. L, ω) M=L/2 11/25/2020 Veton Këpuska 35
Short Time Fourier Transform Magnitude New Sequence Original Windowed Sequence 11/25/2020 Veton Këpuska 36
Other Methods u Sinusoidal Synthesis—Chapter 9 n n n Time-warp the sinewave frequency track and the amplitude function This technique has been successful with not only speech but also music, biological, and mechanical signals Problems u u 11/25/2020 Does not maintain the original phase relations Suffer from reverberance Veton Këpuska 37
Other Methods u Linear Prediction Synthesis n n Use Homomorphic and Linear Prediction results to modify the time base Book briefly mentions this is possible but ran out of time before I could investigate this process more 11/25/2020 Veton Këpuska 38
Other Methods u New Techniques n u Internet search showed several methods trying to improve on what is out there now Software n n Different software programs that will change speed for you Adobe Audition is one of the most all encompassing right now 11/25/2020 Veton Këpuska 39
Matlab Code -Prepare the Workspace %%%%%%%% % Prepare Workspace %%%%%%%% close all; clear all; window_size_1 = 200; frame_rate_1 = 100; %Speed to slow down by speed = 2; 11/25/2020 Veton Këpuska 40
Matlab Code -Load the Speech Signal %%%%%%%% % Load Data File %%%%%%%% filename = input('Please enter the file name to be used. '); [sample_data, sample_rate, nbits] = wavread(filename); loop_time = floor(max(size(sample_data))/frame_rate_1); sample_data((max(size(sample_data))): (loop_time+1)* frame_rate_1)=0; 11/25/2020 Veton Këpuska 41
Matlab Code -Develop the Window %%%%%%%% % Create Windows %%%%%%%% % Want windows of 25 ms % File sampled at 10, 000 samples/sec % Want a window of size 10000 * 25 ms(10 ms) triangle_30 ms = triang(window_size_1); %triangle_30 ms = hamming(window_size_1); W 0 = sum(triangle_30 ms); 11/25/2020 Veton Këpuska 42
Matlab Code -Window the Entire Speech Signal %%%%%%%% % Window the speech %%%%%%%% for i =0: loop_time-1 window_data(: , i+1)=sample_data((frame_rate_1*i)+1: ((i+ 2)* frame_rate_1)). *triangle_30 ms; end 11/25/2020 Veton Këpuska 43
Matlab Code -Perform the Fast Fourier Transform %%%%%%%% % Create FFT %%%%%%%% for i = 1: loop_time window_data_fft(: , i) = fft(window_data(: , i), 1024); end 11/25/2020 Veton Këpuska 44
Matlab Code -Recreate the Modified Signal %%%%%%%% % Recreate Original Signal %%%%%%%% %Initialize the recreated signals reconstructed_signal(1: (loop_time+1)*frame_rate_1)=0; real_reconstructed_signal(1: (loop_time+1)*frame_rate_1)=0; modified_reconstructed_signal(1: (loop_time+3)*(frame_rate_ 1/speed))=0; modified_reconstructed_signal_compressed(1: (loop_time+3)* (frame_rate_1/ speed))=0; 11/25/2020 Veton Këpuska 45
Matlab Code -Recreate the Modified Signal % Perform the ifft for i = 1: loop_time recreated_data_ifft(: , i) = ifft(window_data_fft(: , i), 1024); real_recreated_data_ifft(: , i) = ifft(abs(window_data_fft(: , i)), 1024); truncated_recreated_data_ifft(: , i) = recreated_data_ifft(1: window_size_1, i). *(frame_rate_1/W 0) ; real_truncated_recreated_data_ifft(: , i) = real_recreated_data_ifft(1: window_size_1, i). *(frame_rate_1 /W 0); end 11/25/2020 Veton Këpuska 46
Matlab Code -Recreate the Modified Signal % Get back to the original signal for i=0: loop_time-1 reconstructed_signal((frame_rate_1*i)+1: ((i+2)*frame_rat e_1)) = reconstructed_signal((frame_rate_1*i)+1: ((i+2)*frame_rat e_1)) + truncated_recreated_data_ifft(: , i+1)'; real_reconstructed_signal((frame_rate_1*i)+1: ((i+2)*fram e_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1: ((i+2)*fram e_rate_1)) + real_truncated_recreated_data_ifft(: , i+1)'; end 11/25/2020 Veton Këpuska 47
Matlab Code -Recreate the Modified Signal % Get a modified signal by deleting certain parts (STFT) for i=0: (loop_time-1)/speed modified_reconstructed_signal((frame_rate_1*i)+1: ((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1: ((i+2)*f rame_rate_1)) + real_truncated_recreated_data_ifft(: , i*speed+1)'; end 11/25/2020 Veton Këpuska 48
Matlab Code -Recreate the Modified Signal % Initialize the compressed sequence (STFTM) modified_reconstructed_signal_compressed(1: frame_rate_1+f rame_rate_1/speed+1)=truncated_recreated_data_ifft(fram e_rate_1 -frame_rate_1/speed: window_size_1, 1)'; % Get a modified signal by compressing for i=0: (loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/ speed*i)+1: (frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/ speed*i)+1: (frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(: , i+2)'; end 11/25/2020 Veton Këpuska 49
Matlab Code -Plot Results %%%%%%%% % Plot Results %%%%%%%% Figure; subplot(211) plot(sample_data) title('Original Speech'); v 1=axis; hold on; subplot(212) plot(real(modified_reconstructed_signal)) title(['STFT Synthesis w/ Speed = ', num 2 str(speed), 'X']); v 2=axis; if speed > 1 subplot(211); axis(v 1) subplot(212); axis(v 1) else subplot(211); axis(v 2) subplot(212); axis(v 2) end 11/25/2020 Veton Këpuska 50
Matlab Code -Write Sound Files %%%%%%%% % Write sound files %%%%%%%% wavwrite(modified_reconstructed_signal, sample_rate, nbits, 'C: ClassesECE_5525tea party fairbanks 2 x. wav') 11/25/2020 Veton Këpuska 51
Examples Baseline Samples Sample Rate 2 X STFT Sound file Sample Rate. 5 X STFTM Sound file Original File 11/25/2020 Veton Këpuska 52
Examples STFT—Speed 0. 5 X Sound file 11/25/2020 Veton Këpuska 53
Examples STFT—Speed 2 X Sound file 11/25/2020 Veton Këpuska 54
Examples STFT—Speed 4 X Sound file 11/25/2020 Veton Këpuska 55
Examples STFTM—Speed 0. 5 X Sound file 11/25/2020 Veton Këpuska 56
Examples STFTM—Speed 2 X Sound file 11/25/2020 Veton Këpuska 57
Examples STFTM—Speed 4 X Sound file 11/25/2020 Veton Këpuska 58
More Results u Change in window size n n n If the window size becomes too small, then a change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows 11/25/2020 Veton Këpuska 59
More Results u Change in frame rate n If the frame rate decreases too much, then there will be too many samples overlapping to get an intelligible signal 11/25/2020 Veton Këpuska 60
More Results u Change filter type n n n Tried Hamming—not much perceptual difference Using the window energy becomes important here Frame Rate/W 0 is not equal to one 11/25/2020 Veton Këpuska 61
Conclusion u Optimum area n n u Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods long It is possible to easily change the time scale and still maintain the original pitch although the result is not always natural sounding 11/25/2020 Veton Këpuska 62
Conclusion u Further investigation n What to do when you want to slow down over half. u 11/25/2020 Using the STFTM means there will be gaps between the sequences Veton Këpuska 63
Conclusion u Further investigation n What to do when you want to slow down over half u 11/25/2020 Could replicate windowed segments Veton Këpuska 64
Conclusion u Further investigation n n Use the other methods to determine quality u Implement Sinusoidal Synthesis u Implement Linear Predictive Synthesis using linear prediction and homomorphic methods Work on synchronizing pitch periods u Shift samples so that the peaks line up n n n u 11/25/2020 Scott and Gerber—Synchronized Overlap and Add (SOLA) Cross-correlation of two samples to find peak Use the peaks to line up samples Align the window at same relative location within a pitch period Veton Këpuska 65
Questions u Are there any questions? 11/25/2020 Veton Këpuska 66
References u u Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002. Rabiner, L. R. and Schafer, R. W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978. Oppenheim, A. V and Schafer, R. W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975. Scott, R. and Gerber, S. “Pitch Synchronous Time. Compression of Speech, ” Proc. Conf. Speech Communications Processing, p 63 -85, April 1972. 11/25/2020 Veton Këpuska 67
References u Fairbanks, G. , Everitt, W. L. , and Jaeger, R. P. “Method for Time or Frequency Compression-Expansion of Speech, ” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp. 7 -12, Jan 1954. 11/25/2020 Veton Këpuska 68
Reference Material u http: //www. dspdimension. com/ of Stephan M. Bernsee 11/25/2020 Veton Këpuska 69
- Shifting reflecting and stretching graphs
- Cyclic pitch vs collective pitch
- Back pitch and front pitch
- Concentric winding
- Ac armature winding
- Design of shell and tube heat exchanger by kern method
- Helix angle of propeller
- Pitch 2 pitch chanhassen
- Time stretching matlab
- Z transform time shifting property
- Inverse z-transform table
- Unit step function examples
- Z transform time shifting property
- Precedence rule for time shifting and scaling
- Equation fourier
- Feature of micro computer
- Mainframe or supercomputer
- Microcomputer architecture
- Difference between plc and microprocessor
- Microcomputer
- Ballistic stretch definition
- Sara toogood
- Stretching spring
- Spettrofotometro ir
- Proprioceptive neuromuscular facilitation
- Pnf stretching
- Mobilization grades
- Lajin stretching
- Stretching and bending vibrations in ir spectroscopy
- Stretching and shrinking math book
- Cold stretching cryogenic vessel
- Stretching and bending vibrations in ir spectroscopy
- Circuit training pdhpe
- Hooke's law
- Types of flexibility training
- Ballistic stretching definition
- Static stretching
- Flexibili
- Extension-load graphs physics
- Vortex stretching
- Standing around
- Tae bo stretching
- Contoh perpanjangan lini produk
- Principles of assessment flexibility
- Working capital
- Kinetic energy
- Logarithmic transformation in image processing
- Dynamic stretching for older adults
- Everze nohy
- For minutes. start.
- Shifting more attention to video salient object detection
- Staircase shifting method
- Shifting cultivation and taungya system
- Shifting cultivation aphg
- Normal liver span
- Micro phase shifting
- Diagnose ascites
- Executory interest examples
- Cyclical schedule example
- Chapter 8 organizational leadership
- Shifting equilibrium to the right
- Shifting dullness
- Shifting graphs
- Horseshoe shaped dullness
- Costal margin
- Abdominal
- Shifting tides timeline and map
- Shifting tides timeline and map
- Shifting tides timeline and map
- Shifting the balance