Intro to Audio Signals JyhShing Roger Jang http

  • Slides: 20
Download presentation
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星) http: //mirlab. org/jang MIR Lab, CSIE

Intro. to Audio Signals Jyh-Shing Roger Jang (張智星) http: //mirlab. org/jang MIR Lab, CSIE Dept National Taiwan Univ. , Taiwan

What Are Audio Signals? z. Audio signals are… y. Signals that are audible to

What Are Audio Signals? z. Audio signals are… y. Signals that are audible to human, such as speech Quiz! and music y. The range of fundamental frequencies of audible Quiz! signals is about 20 ~ 20000 Hz. x. The range is wider for the young people, narrower for the elderly. Ultrasound 狗隻、海豚、蝙蝠 犬笛

Voice Generation & Reception z Steps in voice generation & reception y. Vibration of

Voice Generation & Reception z Steps in voice generation & reception y. Vibration of voice source y. Resonance by surrounding objects y. Traveling through air (or other media) y. Reception of membranes and neurons at inner ears y. Recognition by brains z Examples y. Singing y. Whistling y. Guitar y. Flute Pressure wave Sound waveform

Categorization of Audio Signals z Number of sources y. Monophonic: example y. Polyphonic: example

Categorization of Audio Signals z Number of sources y. Monophonic: example y. Polyphonic: example z Waveform y. Quasi-periodic sound xvoiced sound of speech y. Aperiodic sound x. Unvoiced sound of speech z Source types y. Sounds from animals (bioacoustics) x. Dog barking, cat meowing, frog croaking, duck quacking, cow mooing… y. Sounds from nonanimals x. Car engines, thunders, music instruments

Quiz! Parameters for Recording z Three major parameters for recording audio files y. Sample

Quiz! Parameters for Recording z Three major parameters for recording audio files y. Sample rate: no. of samples per second x 8 k. Hz (phone quality) x 16 KHz (for common speech recognition) x 44. 1 KHz (CD quality) Quiz! Hz = Hertz = samples/sec (Also used for fundamental frequency…) y. Bit resolution: no. of bits for representing a sample x 8 -bit (uint 8 with range: 0~255) x 16 -bit (int 16 with range: -32768~32767) y. Number of channels x. Mono: 1 channel x. Stereo: 2 channels

Live Recording z Three major parameters for recording audio files y. Sample rate, bit

Live Recording z Three major parameters for recording audio files y. Sample rate, bit resolution, and no. of channels y. Demo of recording via Cooledit x. Compare of the waveforms of a tuning fork and human speech of vowels. What is the major difference? Why? Quiz!

Tools for General Audio Processing z. Tools for real-time recording and waveform display y.

Tools for General Audio Processing z. Tools for real-time recording and waveform display y. Audacity y. Cool. Edit y. Gold. Wave y. MATLAB

S/U/V in Speech z. Speech signals can be divided into S, U, V Quiz!

S/U/V in Speech z. Speech signals can be divided into S, U, V Quiz! y. S (silence): no speech activity y. U (unvoiced): speech activity without vibration from vocal chords y. V (voiced): speech activity with vibration z. How to detect S, U, V? y. Put your hand on your throat to feel the vibration y. Observe the waveform directly

Speech Signal of “Sunday” z. Unvoiced vs. voiced frames

Speech Signal of “Sunday” z. Unvoiced vs. voiced frames

Silence, Unvoiced and Voiced Sounds z. Examples of S, U, V Quiz! y“Six” s

Silence, Unvoiced and Voiced Sounds z. Examples of S, U, V Quiz! y“Six” s u v s u s y“資訊系” s u v u v s

Storage for Audio Files z. Examples of storage requirement y 1 min. of recording

Storage for Audio Files z. Examples of storage requirement y 1 min. of recording with fs=16000, nbits=16, #channel=1 60 (sec)*16 (KHz)*2 (bytes)*1 (channel) = 1920 KB = 1. 92 MB y 3 -mins of CD music with fs=44. 1 KHz, nbits=16, #channel=2 180 (sec)*44. 1 (KHz)*2 (bytes)*2 (channels) = 31752 KB = 32 MB Quiz! MP 3 compression ratio is about 10!

Human Speech Production

Human Speech Production

Source-filter Model for Human Speech Production Speech is split into a rapidly varying excitation

Source-filter Model for Human Speech Production Speech is split into a rapidly varying excitation signal and a slowly varying filter. The envelope of the power spectra contains the vocal tract info. unvoiced Pharyngeal cavity Nasal cavity Oral cavity voiced Two important characteristics of the model are fundamental frequency (f 0) and formants (F 1, F 2, F 3, …)

The Vocal Tract

The Vocal Tract

Glottal Volume Velocity & Resulting Sound Pressure (Voiced)

Glottal Volume Velocity & Resulting Sound Pressure (Voiced)

Speech Production Glottal Pulses Vocal Tract + + (a) Source Spectrum Speech Signal =

Speech Production Glottal Pulses Vocal Tract + + (a) Source Spectrum Speech Signal = = (b) Filter Function (c) Output Energy Spectrum

Videos for Vocal Cords Movement z. Movement of vocal cords yhttp: //www. youtube. com/watch?

Videos for Vocal Cords Movement z. Movement of vocal cords yhttp: //www. youtube. com/watch? v=m. Jedwz_r 2 Pc yhttp: //www. youtube. com/watch? v=v 9 Wdf-Rw. Lcs

Other Interesting Phenomena z. Interesting phenomena about audio signals y. Don’t trust what you

Other Interesting Phenomena z. Interesting phenomena about audio signals y. Don’t trust what you have heard! (Vision rules) y. Perceived speech is highly context dependent:

Hints for Exercises z. How to generate a sine wave signal: y. Math formula:

Hints for Exercises z. How to generate a sine wave signal: y. Math formula: y. MATLAB code: duration=3; f=440; fs=16000; time=(0: duration*fs-1)/fs; y=sin(2*pi*f*time); plot(time, y); sound(y, fs);

Voice Generation & Reception 空氣的壓力波 音訊的波形

Voice Generation & Reception 空氣的壓力波 音訊的波形