Audio Processing Mitch Parry Similar to Image Processing

  • Slides: 37
Download presentation
Audio Processing Mitch Parry

Audio Processing Mitch Parry

Similar to Image Processing? • For images a pixel is the smallest unit •

Similar to Image Processing? • For images a pixel is the smallest unit • The color is a distribution of the spectrum of visible light. • Video samples at ~30 frames per second Amplitude R G B One color red yel. green blue http: //www. chemistryland. com www. jiscdigitalmedia. ac. uk

Similar to Image Processing? • Each pixel contains R, G, and B corresponding to

Similar to Image Processing? • Each pixel contains R, G, and B corresponding to three cones that perceive color. • A frame is a picture from “one instant of time” Amplitude R G B One color red yel. green blue http: //www. chemistryland. com www. jiscdigitalmedia. ac. uk

Resource!

Resource!

Chapter 2: Sound Waves • • Sound Waves and Harmonic Motion Properties of Sine

Chapter 2: Sound Waves • • Sound Waves and Harmonic Motion Properties of Sine Waves Resonance as Harmonic Frequencies Nonsinusoidal Waves

Chapter 5: Digitization • • Sampling and Aliasing Quantization Dynamic Range Nyquist and Aliasing

Chapter 5: Digitization • • Sampling and Aliasing Quantization Dynamic Range Nyquist and Aliasing

– Hundredths of a second • One audio frame – Hundredths of a second

– Hundredths of a second • One audio frame – Hundredths of a second blue green yel. red 400 nm 700 nm Power • One color in one pixel in one frame of video Power Spectral Domain 20 Hz 20 k. Hz

Audacity: Plot Spectrum

Audacity: Plot Spectrum

Audio Mixing • Free Multitrack Downloads • http: //www. cambridge-mt. com/ms-mtk. htm

Audio Mixing • Free Multitrack Downloads • http: //www. cambridge-mt. com/ms-mtk. htm

“Stop Messing with Me” by Sven Bornemark • • Steinberg Grand Piano Acoustic Guitar

“Stop Messing with Me” by Sven Bornemark • • Steinberg Grand Piano Acoustic Guitar Bass Drums Overhead Electric Guitar Ambience Kick Drum Vocal

Audacity: Mixing Tutorial • Mixing Tutorial

Audacity: Mixing Tutorial • Mixing Tutorial

Simple Unmixing • Left: Drums + 0. 5 * Vocal • Right: Guitar +

Simple Unmixing • Left: Drums + 0. 5 * Vocal • Right: Guitar + 0. 5 * Vocal • Remove vocals: – Karaoke track = Left – Right = Drums – Guitar

Audacity: Let’s try it. Real example: Norah Jones

Audacity: Let’s try it. Real example: Norah Jones

Removing Hiss_*. wav

Removing Hiss_*. wav

Removing Clicks

Removing Clicks

Short-Time Fourier Transform Spectrogram FFT Each frame contributes one column of spectrogram

Short-Time Fourier Transform Spectrogram FFT Each frame contributes one column of spectrogram

Audacity: Let’s try it.

Audacity: Let’s try it.

Changing Speed • Downsample – Shorten the clip – Increase its pitch

Changing Speed • Downsample – Shorten the clip – Increase its pitch

Changing Tempo • Change length of clip without changing pitch • Split into frames,

Changing Tempo • Change length of clip without changing pitch • Split into frames, repeat or remove frames

Changing Pitch • Change pitch without changing length – Increase pitch: Repeat frames and

Changing Pitch • Change pitch without changing length – Increase pitch: Repeat frames and downsample – Decrease pitch: Remove frames and upsample

Beats • Amplitude Envelope – – Filterbank Full-wave rectify Low-pass filter Differentiate/Half-wave rectify Scheirer.

Beats • Amplitude Envelope – – Filterbank Full-wave rectify Low-pass filter Differentiate/Half-wave rectify Scheirer. JASA 1998. Tzanetakis. AMTA 2001 IPEM Toolbox

Beats • Beat Envelope – Filterbank (Discrete Wavelet Transform) – Full-wave rectify – Low-pass

Beats • Beat Envelope – Filterbank (Discrete Wavelet Transform) – Full-wave rectify – Low-pass filter – Differentiate/Half-wave rectify – Low-pass filter – Sum • Peak detection Scheirer. JASA 1998. Tzanetakis. AMTA 2001 IPEM Toolbox

Audacity: Beat Detection • Drum track

Audacity: Beat Detection • Drum track

Audacity • Audacity Manual • More Effects and Analyzers

Audacity • Audacity Manual • More Effects and Analyzers

Musical Features • • Visualizing Structure Rhythm/ Tempo Melody Timbre

Musical Features • • Visualizing Structure Rhythm/ Tempo Melody Timbre

Foote & Cooper. ICMC 2001. Visualizing Structure • Compute any features • Choose similarity

Foote & Cooper. ICMC 2001. Visualizing Structure • Compute any features • Choose similarity metric • Visualize self-similarity

Visualizing Structure • High-level segmentation based on novelty score

Visualizing Structure • High-level segmentation based on novelty score

Tempo Foote & Cooper. ICMC 2001 Beat Spectrum Diagonal Sums Autocorrelation

Tempo Foote & Cooper. ICMC 2001 Beat Spectrum Diagonal Sums Autocorrelation

Haitsma & Kalker. ISMIR 2002. Identifying Identical Audio • Segmentation – 0. 37 second

Haitsma & Kalker. ISMIR 2002. Identifying Identical Audio • Segmentation – 0. 37 second frames – Overlapping by 31/32 • FFT – Band Division – Energy computed for 33 non-overlapping logarithmically spaced frequency bands (300 -2000 Hz) – E(n, m) = energy of band m of frame n.

Haitsma & Kalker. ISMIR 2002. Identifying Identical Audio 2 • 32 -bit sub-fingerprint represents

Haitsma & Kalker. ISMIR 2002. Identifying Identical Audio 2 • 32 -bit sub-fingerprint represents increase/decrease between neighboring frequency bands and frame n-1 n … Time (Frames) F(n, m) = [E(n-1, m+1) + E(n, m)] -[E(n-1, m) + E(n, m+1)] > 0 Frequency Bands m m+1 … 33 257 + + -

Haitsma & Kalker. ISMIR 2002. Identifying Identical Audio 3 • Similarity is the bit

Haitsma & Kalker. ISMIR 2002. Identifying Identical Audio 3 • Similarity is the bit error rate (BER) between two fingerprints • Approximately 3 seconds of audio • 256 X 32 -bit = 1 KB per fingerprint.

Aucouturier & Klapuri. ISMIR 2002. Timbre Similarity • Timbre = “Color” of sound •

Aucouturier & Klapuri. ISMIR 2002. Timbre Similarity • Timbre = “Color” of sound • Timbre = Type of instrument, voice • Similarity decreases in order: – Same recording – Same artist – Same genre • Useful for finding different live performances of the same song by an artist

Aucouturier & Klapuri. ISMIR 2002. Timbre Similarity 2 • Timbre Features – Low-order MFCCs

Aucouturier & Klapuri. ISMIR 2002. Timbre Similarity 2 • Timbre Features – Low-order MFCCs account for timbre. – Hi-order MFCCs account for pitch. – Only use first 8 MFCCs (out of 13). • Feature Extraction: – Segment signal into 0. 05 sec. non-overlapping frames – Compute first 8 MFCCs for each frame. – Yields ~3600 features (28, 800 scalars) per song

Aucouturier & Klapuri. ISMIR 2002. Timbre Similarity 3 • Gaussian Mixture Model (GMM) –

Aucouturier & Klapuri. ISMIR 2002. Timbre Similarity 3 • Gaussian Mixture Model (GMM) – Approximates the distribution of features as the sum of M Gaussian distributions –M=3 • Learn timbre model for each song • Timbre similarity between song A and song B is the likelihood that the model for song A generated the features in song B.

Timbre Similarity Examples • http: //www. csl. sony. fr/~jj/Timbre/timbre. html

Timbre Similarity Examples • http: //www. csl. sony. fr/~jj/Timbre/timbre. html

Audio Textures Lu et. al. ICASSP 2002 • Generate new audio given examples •

Audio Textures Lu et. al. ICASSP 2002 • Generate new audio given examples • Analysis – Segment into frames – Extract MFCCs – Similarity • Window Weighted Cosine Distance – Transition probabilities proportional to exponential similarity – Segment into sub-clips according to novelty score

References • • Aucouturier, J-J. , and Klapuri, A. (2002). "Music Similarity Measures: What's

References • • Aucouturier, J-J. , and Klapuri, A. (2002). "Music Similarity Measures: What's the Use? ". Proc. of Int'l Conference on Music Information Retrieval, 3, (pp. 157 -163). PDF Foote, J. and Cooper, M. (2001). "Visualizing Musical Structure and Rhythm via Self. Similarity. " Proc. of Int'l Computer Music Conference, 27, (pp. 419 -422). PDF Haitsma, J. and Kalker, T. (2002). "A Highly Robust Audio Fingerprinting System. " Proc. of Int'l Conference on Music Information Retrieval, 3, (pp. 107 -115). PDF Lu, L. , Li, S. , Liu, W. , AND Zhang, H. (2002). “Audio Textures. ” Proc. of IEEE Int’l Conference on Acoustics, Speech and Signal Processing. PDF Paulus, J. & Klapuri, A. (2002). Measuring the Similarity of Rhythmic Patterns. Proc. of the International Conference on Music Information Retrieval, 3, (pp. 150 -156). Paris: IRCAM Centre Pompidou. PDF Scheirer, E. (1998). "Tempo and Beat Analysis of Acoustic Musical Signals. ” Journal of the Acoustical Society of America, 103(1), 588 -601. PDF Tzanetakis, G. , Essl, G. , & Cook, P. (2001). Audio Analysis using the Discrete. Wavelet Transform. Proc. of WSES International Conference on Acoustics and Music: Theory and Applications. PDF Tzanetakis, G. , Ermolinskiy, A. and Cook, P. (2002). "Pitch Histograms in Audio and Symbolic Music Information Retrieval. " Proc. of Int'l Conference on Music Information Retrieval, 3, (pp. 31 -38). PDF