Basic Features of Audio Signals JyhShing Roger Jang

Basic Features of Audio Signals (音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http: //mirlab. org/jang MIR Lab, CSIE Dept National Taiwan Univ. , Taiwan

Audio Features z Commonly used audio features include volume, pitch, zero crossing rate, spectrum, etc. y These features can be perceived subjectively. z Our goals y To define formulas for computing these features y To compute these features for further analysis and recognition of audio signals.

General Steps for Audio Analysis 1. Frame blocking y Frame duration of 20~40 ms or so 2. Frame-based feature extraction y Volume, zero-crossing rate, pitch, MFCC, etc. 3. Frame-based Analysis Query by singing/humming Mel-frequency cepstral coefficients y Pitch vector for QBSH comparison y MFCC for speech recognition via HMM training & evaluation Hidden Markov models y…

Frame Blocking Quiz! Overlap Frame Sample rate = 16 k. Hz Frame size = 512 samples Frame duration = 512/16000 = 0. 032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512 -192 = 320 samples Frame rate = 16000/320 = 50 frames/sec frame size = hop size + overlap hop size overlap

Basic Features of Audio Signals z Volume (音量): the amplitude of audio signals y Also known as intensity, or energy. z Pitch (音高): Fundamental frequency (the number of fundamental periods in a second) y Usually males have a lower pitch while females have a higher one Check out waveform z Timbre (音色): Waveform inside a fundamental period. Of your recording! y Different vowels have different timbres y Different singers also have different timbres. Quiz!

Audio Features in Time Domain z Three of the most prominent audio features in a frame (aka analysis window) Quiz: How to control these features during your pronunciation? Fundamental period Volume: Waveform amplitude Timbre: Waveform within an FP

Audio Features in Frequency Domain z Frequency-domain audio features in a frame y Energy: Sum of power spectrum y Pitch: Distance between harmonics y Timbre: Smoothed spectrum F 1: First formant Volume: Waveform height Pitch F 2: Second formant

Frame-based Manipulation z How to pack frames into a matrix for easy manipulation in MATLAB: y [y, fs] = audioread(‘file. wav’); y frame. Mat = enframe(y, frame. Size, overlap); … Frame n Frame 2 Frame 1 frame. Mat =

Introduction to Volume z Loudness of audio signals y Visual cue: Amplitude of vibration y AKA energy or intensity z Two major ways to compute volume in a frame: y Volume: x Easy computation y Energy (in decibel): x Better correlation with our perception Quiz!

Volume: Perceived and Computed z Perceived volume is influenced by y Frequency y Timbre z Computed volume is influenced by y Microphone types y Microphone setups

Volume Computation z To avoid DC bias (or DC drifting) y DC bias: The vibration is not around zero y Computation (assuming constant DC bias): x Volume: x Energy (in decibel): z How to prove these identities? Quiz!

Examples of Volume z Functions for computing volume y Example: volume 01 y Example: volume 02 y Example: volume 03 z Volume depends on… y Frequency x Equal loudness test y Timbre x Example: volume 04

Zero Crossing Rate z Zero crossing rate (ZCR) y Number of zero crossings in a frame. z Characteristics： y Higher for noise and unvoiced sounds, lower for voiced sounds. y Zero-justification is required before computing ZCR. z Usage Quiz! y For endpoint detection, especially for detecting unvoiced sounds. y To distinguish unvoiced sound from noise, usually we add a shift before computing ZCR.

Examples of ZCR (1/2) z ZCR computing y Example: zcr 01 y Example: zcr 02

Examples of ZCR (2/2) z Use ZCR to distinguish between unvoiced sounds and environmental noise y Example: zcr. With. Shift

Fundamental Frequency & Pitch z Fundamental frequency (FF) y The no. of fundamental period in a second. y Unit: Hertz (Hz). Not related to sample rate! z Pitch z Can be converted from FF in Hertz: z Unit: semitone or MIDI number Piano roll via HTML 5 Quiz!

Pitch Computation for Tuning Forks z Pitch of tuning forks (code)

Pitch Computation for Speech z Pitch of speech (code)

Pitch Change due to Fast Forward z If audio is played at a higher sample rate… y Pitch is higher y Duration is shorter z Pitch change due to sample rate change at playback y Sample rate: fs k*fs (at playback) y Duration: d d/k y Fundamental frequency: ff k*ff y Pitch: pitch+12*log 2(k) Quiz!

Pitch Perception z Age-related hearing loss z Frequencies vs. ages y As one grows old, the audible frequency bandwidth is getting narrower y Mosquito ringtone x Low to high, high to low x Applications 21 k 17. 4 k 15 k 12 k 8 k

Tones in Mandarin Chinese z Some statistics about Mandarin Chinese y 5401 characters, each character is at least associated with a base syllable and a tone y 411 base syllables, and most syllables have 4 tones, so we have 1501 tonal syllables z Syllables with 3 or less tones y 趴爬怕、當檔蕩、嗲 z More examples y 1234：三民主義、三國演義、優柔寡斷、花明柳暗、科學理論 y ? ? ? ：美麗大教堂、滷蛋有夠鹹（Taiwanese） y Tone sandhi：勇猛果敢

Features Related to Tones z Tone is characterized by the pitch curves: Quiz! y Tone 1: high-high y Tone 2: low-high y Tone 3: high-low-high y Tone 4: high-low (Put you hand on your throat and you can feel it…) z Tone recognition is mostly based on features of pitch and volume

Mandarin Tone Practice z 雙音節詞連音組合

Other Things about Pitch z Some interesting phenomena about pitch y Beat x Music by beats y Doppler effect y Shepard tone z Have you tried these? y Inhale helium to produce high (squeaky) pitch y Resonance: break a glass with the right pitch (just like a swing) x Auditory illusion of a tone that ascends or descends in pitch continuously y Overtone singing How to create these effects in MATLAB?

Beat z Beat: An interference between two sounds of slightly different frequencies… Quiz! y Audible beat frequency = Not | f 1 – f 2 |/2! signal 1 & signal 2 signal 1 plus signal 2

Experiments of Beats z Beats in MATLAB z Beats in the air fs=8000; duration=5; t=(1: duration*fs)/fs; y 1=0. 8*cos(2*pi*440*t)'; y 2=0. 8*cos(2*pi*444*t)'; sound(y 1+y 2, fs); y y 1+y 2: Beat frequency = 4 Hz sound([y 1, y 2], fs); y y. Left: y y. Right: y y. Both: Beat frequency = 4 Hz

Timbre z Timbre is represented by y Waveform within a fundamental period y Frame-based energy distribution over frequencies x Power spectrum (over a single frame) x Spectrogram (over many frames) y Frame-based MFCC (mel-frequency cepstral coefficients)

Timbre Demo: Real-time Spectrogram z Simulink model for real-time display of spectrogram y dspstfft_audio (Before MATLAB R 2011 a) y dspstfft_audio. Input (R 2012 a or later) Spectrum: Spectrogram: