Chapter 2 Digital Models for the Speech Signal














- Slides: 14
Chapter 2 Digital Models for the Speech Signal z 2. 1 The production of speech signal z 2. 2 The equations and solutions of sound wave of speech z 2. 3 The digital models of speech signal z 2. 4 Chinese acoustic phonetics z 2. 5 Speech waveform and spectrgram
2. 1 The production of speech signal (1) z 2. 1. 1 The vocal organs y Vocal source : lung, vocal cord and vocal gate y Vocal tract: start from voice gate till lip including throat tract and air tract, y Mouth cavity, nose cavity and lip z 2. 1. 2 The process and system of speech production y The excitation of sound wave propagation of sound wave radiation of sound wave system of speech production
2. 2 The equations and solutions of sound wave of speech (1) z 2. 2. 1 The partial differencial equations of sound wave in vocal tube -δp/δx = ρ(δ(u/A)/δt) -δu /δx = (1/ ρc 2)(δ(p. A)/ δt) + δA/ δt where p = p( x, t ) is the variation in sound pressure in the tube at position x and time t u = u(x, t) is the variation in volume velocity flow at position x and time t ρis the density of air in the tube, and c is the velocity of sound A = A ( x, t ) is the area function,
The equations and solutions of sound wave of speech (2) z 2. 2. 2 Special solutions: z 1. Lossless vocal tract model: A( x, t ) = constant z Equation is simplified as wave equation in lossless transmission lines The solution is: u(x, t) = u+(t-x/c) – u-(t+x/c) p(x, t) =(ρc/A )[u+(t-x/c) + u-(t+x/c)] z 2. Cascaded lossless vocal tract model: Ak=const The solutions are: uk(x, t) = uk+(t-x/c) – uk-(t+x/c) pk(x, t) =(ρc/Ak )[uk+(t-x/c) + uk-(t+x/c)] with the continuous conditions: uk(lk, t) = uk+1(0, t), pk(lk, t) = pk+1(0, t)
The equations and solutions of sound wave of speech (3) z 2. 2. 3 Some conclusions z rk = (Ak+1 -Ak)/(Ak+1+Ak) is the reflect coefficient at junction k and |rk|<=1 z If there are N nodes cascading from vocal gate to lip, the system function is V(z) = 0. 5(1+r. G)Πk=1 N(1+rk)z. N/2/(1 -Σ Na z-k) = Kz-N/2/(1 -Σ N a z-k) k=1 k y This is an all poles model y The pole pairs of V(f) correspond to formants(F 1, F 2, F 3, F 4, F 5…) y For a concatenation of N tubes, there will be at most N/2 complex conjugate poles, or frequency of resonance or formants. Given the total length is L = l. N, the sampling period is T = 2τ, and the propagation delay in each tube τ=l/c, we can find N=2 LFs/c, where Fs = 1/T is sampling frequency
2. 3 The digital models of speech signal(1) z 2. 3. 1 Model of vocal source Unvoiced – Random noise generator Voiced – quasi periodic δ pulses Vocal gate pulse wave: [ 1 – cos(n/N 1)]/2 0 <= n <= N 1 G(n) = cos[(n- N 1)/2 N 2] N 1 <= n <= N 1 + N 2 0 otherwise z 2. 3. 2 Model of vocal tract (All poles model) V(z) = 0. 5(1+r. G) Π(1+rk)z-N/2/(1 - Σakz-k) Great success, but for nasal, zeros are needed z 2. 3. 3 Model of lips R(z) = R 0(1 – z-1)
The digital models of speech signal(2) z 2. 3. 4 The whole digital model of speech signal production H(z) = G(z) V(z) R(z) The voiced signal is generated by the system of H(z) if the input is quasi periodic δ pulses. See picture for the whole model. z 2. 3. 5 The short-time property of speech signal The system is time-variant linear system, parameters of V(z) are constant during 5 -50 ms(10 -30 ms) The windowing technology: Rectangle window Hamming window Hanning window The framming conception: frame width, frame shift
2. 4 Chinese acoustic phonetics (1) z 2. 4. 1 Phonemes ( minimal basic speech unit ) y Consonants of Chinese x b, p, d, t, g, k, h, j, q, x, z, c, s, r, zh, ch, sh, m, f, n, l, ng x Shenmu: Initials( of a syllable) are consonants ( all above but ng ) x Types of Chinese consonants y Vowels of Chinese x Single: a, o, e, i, u, v, i 1, i 2, e’ x Yunmu: Finals( of a syllable) x Compound ( Yunmu ) : • Bi-vowels – ai, ei, ao, ou, ia, ie, ua, uo, ve, er • Tri-vowels – iao, iou, uai, uei x Nasal ( Yunmu ) : • • an, ian, uan, van, en, in, uen, vn ang, iang, uang, eng, ing, ueng, ong, iong
Chinese acoustic phonetics (2) z 2. 4. 2 Syllable Structure y Syllable ( minimal meaningful speech unit ) y General: CV, VC, CVC, … y Chinese: (C)V(C) or (C 1)V(C 2), ( ) means optional y C 1 = { All consonants } – { ng }, C 2 = { n, ng } z 2. 4. 3 Pitch and Four Tones y Pitch fp or F 0 is the frequency of vocal gate pulse, sometime pitch period Tp = 1/F 0 is used. y Average pitch for male is about 80 -100 Hz, that for female and children is about 120 -200 Hz
Chinese acoustic phonetics (3) y. Chinese (Mandarin) is tonal language, the tone of every syllable (vowel) is meaningful for the syllable. y. Tone means the change modes of pitch during the voiced part. There are 4(5) tones in Chinese. See Fig. 2 -19 (page 21) for typical four tones of Chinese. y. There about 400 syllables without tones in Chinese y. There about 1200 syllables with tones in Chinese
Chinese acoustic phonetics (4) z The tones for Chinese (here tone value is roughly for reference) 1 tone : flat tone, tone value 5 -5 2 tone : rising tone, tone value 3 -5 3 tone : rising and falling tone, tone value 2 -1 -5 4 tone : falling tone, tone value 5 -1 5 tone : light sound, weak intensity, flat pitch, short duration z Variation of tones when different tones are connected together.
Chinese acoustic phonetics (5) z 2. 4. 4 Four elements of sound y Pitch of sound y Duration of sound y Intensity of sound y Timber of sound z 2. 4. 5 Prosody y Variation of tones, duration and intensity y Stress and Intonation
2. 5 Speech waveform and spectrogram ( sonogram ) z 2. 4. 1 The recording of speech signal y Sampling period or frequency : 8 KHz, 16 KHz, 11 KHz, 22 KHz, 44 KHz y Sampling amplitude : 8 -bit, 16 -bit y Starting and ending of the utterance (need to be detected for some ) y Speech data tools and files (. wav, . adc, . pcm …) z 2. 4. 2 The sonogram y Analog sonogram y Digital sonogram