Speaking Style Conversion Dr Elizabeth Godoy Speech Processing
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012
Apply VC principles to a different problem… 2 E. Godoy, Speaking Style Conversion December 11, 2012
Speech Intelligibility Context Speech is often heard in adverse conditions Noisy environments Listener has difficulty hearing/understanding noise no noise Example of speech with environmental barriers: the speech is not very intelligible! How to transform speech to make it more intelligible…? 3 To make speech synthesis systems more effective E. Godoy, Speaking Style Conversion December 11, 2012
Intelligible Speaking Styles Lombard speech I. Speaker is immersed in noise Human reflex to increase the speech loudness normal Clear speech II. Listener faces barrier (noise, hearing, language, …) Speaker adapts strategy to increase speech clarity casual 4 Lombard clear E. Godoy, Speaking Style Conversion December 11, 2012
VC to improve speech intelligibility? Voice Conversion Modify speech to change the speaker identity Learn transformation from source-to-target speaker Speaking Style Conversion Modify speech to improve intelligibility Determine transformation from normal-to-intelligible style Spectral Envelope: still very important! 5 E. Godoy, Speaking Style Conversion December 11, 2012
Overview: Analyses-to-Modifications Acoustic analyses to identify (mainly spectral) characteristics of Lombard & Clear styles I. i. ii. II. Result of analyses inspire spectral modifications to improve intelligibility i. ii. 6 Average Spectra Vowel Spaces Spectral energy band boosting (corrective filters) Formant shifting (frequency warping) E. Godoy, Speaking Style Conversion December 11, 2012
Corpora Lombard-normal: Grid 8 speakers (4 male, 4 female) 50 sentences each Lombard Ninf 96: most extreme (Lu & Cooke) Clear-casual: LUCID read sentences 7 8 speakers (4 male, 4 female) 50 sentences each Read speech: most exaggerated (Baker & Hazan) E. Godoy, Speaking Style Conversion December 11, 2012
Average Relative Spectra Recall Amplitude Scaling in DFWA Average Relative spectra is similar: 8 difference between normal (X) and intelligible (Y) style Average across all frames E. Godoy, Speaking Style Conversion December 11, 2012
Average Relative Spectra (by Speaker) Lombard-normal 9 Clear-casual E. Godoy, Speaking Style Conversion December 11, 2012
Average Relative Spectra (Overall) 10 Lombard speech: Spectral energy boosting “where formants are” (~500 -4500 Hz) Clear speech: Varies depending on speaker strategy, extent of differences mild overall E. Godoy, Speaking Style Conversion December 11, 2012
Vowel Spaces (average for all speakers) n n 11 Lombard speech: Vowel Space Translation Clear speech: Vowel Space Expansion E. Godoy, Speaking Style Conversion December 11, 2012
Inspiration for Speech Modifications Spectral energy band boosting (Lombard) Vowel space expansion (Clear) 1. 2. Features attributed with increased speech intelligibility Though not observed together in human speech production… Signal processing algorithms can accomplish both! 12 E. Godoy, Speaking Style Conversion December 11, 2012
Spectral Energy Band Boosting Corrective Filters Lombard-inspired & Enhanced (high SII) 13 Corrective Filter: Varying Gain E. Godoy, Speaking Style Conversion December 11, 2012
Frequency Warping for VS Expansion Curve fitting formant shifts inspires warping… 14 E. Godoy, Speaking Style Conversion December 11, 2012
Sound Samples With Noise (SSN, 0 d. B) Original Warp Boost BW 15 No Noise Original Warp. E Boost BW E. Godoy, Speaking Style Conversion December 11, 2012
Want more ? See Maria’s presentation for more details … 16 E. Godoy, Speaking Style Conversion December 11, 2012
Voice & Speaking Style Conversion Parallels Voice Conversion Dynamic Frequency Warping + Amplitude Scaling (based on acoustic-phonetic spaces of source & target speakers) Speaking Style Conversion Frequency Warping + Corrective Filter 1. 2. 17 Clear-speech inspired frequency warping for vowel space expansion Lombard-speech inspired corrective filters to increase loudness E. Godoy, Speaking Style Conversion December 11, 2012
Thank you! More Questions?
Extras…
Objective Metrics for Evaluation Loudness I. Energy in frequency bands weighted based on human hearing Speech Intelligibility Index (SII) II. 20 Energy & modulations in frequency bands relative to a noise masker E. Godoy, Speaking Style Conversion December 11, 2012
Loudness Distributions n n n 21 Lombard speech: “louder” for voiced (bi-modal) Clear speech: not “louder” than casual speech Transients: neither style distinguishes on average E. Godoy, Speaking Style Conversion December 11, 2012
Extended SII Distributions ext. SII highly correlated with ave loudness Lombard speech objectively more intelligible Clear speech intelligibility gain not captured by ext. SII 22 limitations of objective intelligibility metrics E. Godoy, Speaking Style Conversion December 11, 2012
Observations from Analyses Lombard Speech Spectral boosting in inclusive formant region Ø Vowel space translation, but no expansion Clear Speech Small changes in average spectra (slight spectral “flattening”) Consistent vowel space expansion Ø Increase in Loudness (also ext. SII) Greater vowel discrimination Comparison between styles Acoustic differences Ø Ø 23 translate into perceptual distinctions linked to intelligibility gains Spectral boosting & Vowel space expansion: mutually exclusive E. Godoy, Speaking Style Conversion December 11, 2012
- Slides: 23