Swedish National Graduate School of Language Technology Formant

  • Slides: 23
Download presentation
Swedish National Graduate School of Language Technology Formant Synthesis of Speaker Age (+ a

Swedish National Graduate School of Language Technology Formant Synthesis of Speaker Age (+ a question regarding prosodic timing) Susanne Schötz Centre for Languages and Literature Lund University susanne. schotz@ling. lu. se

A Prototype System for Analysis of Speaker Age by Formant Synthesis 2021 -09 -05

A Prototype System for Analysis of Speaker Age by Formant Synthesis 2021 -09 -05 1

background (1): speaker age acoustic cues to speaker age in almost every phonetic dimension

background (1): speaker age acoustic cues to speaker age in almost every phonetic dimension relative importance of these cues not fully explored one reason: the lack of an adequate analysis tool (where a large number of potential age parameters can be varied systematically and studied in detail) 2021 -09 -05 2

background (2): formant synthesis robust & flexible, but not as naturalsounding as concatenation synthesis

background (2): formant synthesis robust & flexible, but not as naturalsounding as concatenation synthesis GLOVE (OVE III with improved glottal source, dev. at Royal Institute of Technology, Stockholm) used in experiments of voice variation since 1989 (Carlson et al. , 1991; Karlsson, 1992) data-driven formant synthesis (GLOVE) (Sjölander, 2001; Sigvardson, 2002; Öhlin, 2004) 2021 -09 -05 3

purpose & aim develop a prototype system for analysis of speaker age by data-driven

purpose & aim develop a prototype system for analysis of speaker age by data-driven formant synthesis use as a tool to analyse, model and synthesize speaker age 2021 -09 -05 4

material 1 word: ‘själen’ [ˈɧɛːlən] (the soul) 4 speakers (same dialect & family) speaker

material 1 word: ‘själen’ [ˈɧɛːlən] (the soul) 4 speakers (same dialect & family) speaker 2021 -09 -05 1: 2: 3: 4: a a granddaughter (aged 6) daughter (aged 36) mother (aged 66) grandmother (aged 91) 5

material: acustic “pre-analysis” F 0 -contours for ['ɧɛːlən] + F 1 -F 2 plot

material: acustic “pre-analysis” F 0 -contours for ['ɧɛːlən] + F 1 -F 2 plot for the steady-state part of [ɛː] 2021 -09 -05 6

method (1) data-driven analysis by synthesis (GLOVE, Praat, Perl) automatic extraction of 23 GLOVE

method (1) data-driven analysis by synthesis (GLOVE, Praat, Perl) automatic extraction of 23 GLOVE parameters (once every 10 ms) formant synthesis (GLOVE) audio-visual comparison of synthesis to original parameter adjustment rules 2021 -09 -05 7

original results: speaker 1 (6 years) synthesis + fricative, creak - formant error, amplitude

original results: speaker 1 (6 years) synthesis + fricative, creak - formant error, amplitude 2021 -09 -05 8

original results: speaker 2 (36 years) synthesis + formants, F 0 - dull 2021

original results: speaker 2 (36 years) synthesis + formants, F 0 - dull 2021 -09 -05 9

original results: speaker 3 (66 years) synthesis + formants, F 0 - amplitude, dull

original results: speaker 3 (66 years) synthesis + formants, F 0 - amplitude, dull 2021 -09 -05 10

original results: speaker 4 (91 years) synthesis + formants, F 0 (incl. creak) -

original results: speaker 4 (91 years) synthesis + formants, F 0 (incl. creak) - amplitude, fricative, dull 2021 -09 -05 11

method (2) weighted linear interpolation between two source speakers to synthesize a target age

method (2) weighted linear interpolation between two source speakers to synthesize a target age (Praat, Java) ex. target age: 51 source speakers: 2 (36) and 3 (66) age weights for each source speaker: 0. 5 duration interpolation for segment 1: (source speaker 2 dur = 100 ms, source speaker 3 dur = 200 ms) target dur = 100 x 0. 5 + 200 x 0. 5 = 150 ms 2021 -09 -05 12

results (interpolation) at a first glance: similarities… …linear interpolation not optimal: aging is not

results (interpolation) at a first glance: similarities… …linear interpolation not optimal: aging is not linear! only first attempt how evaluate? - perception tests (31 students) age (naturalness) 2021 -09 -05 13

evaluation results: natural speakers CA 6 36 66 91 2021 -09 -05 PA 7

evaluation results: natural speakers CA 6 36 66 91 2021 -09 -05 PA 7 30 35/36 74 14

evaluation results: data-driven synthesis ”CA” 6 36 66 91 2021 -09 -05 PA 12

evaluation results: data-driven synthesis ”CA” 6 36 66 91 2021 -09 -05 PA 12 41 44 69 15

evaluation results: linear age interpolation 2021 -09 -05 ”CA” PA 10 13 20 63

evaluation results: linear age interpolation 2021 -09 -05 ”CA” PA 10 13 20 63 30 54 40 42 50 42 60 54 70 58 80 70 16

summary & discussion (1) both similarities and differences between original and synthesis a good

summary & discussion (1) both similarities and differences between original and synthesis a good start, but needs more work formants amplitudes voice source parameters try other interpolation algorithms (spline? ) if developed further, may be used to to model and synthesize speaker age 2021 -09 -05 17

Question about prosodic timing Bruce (1983): strong Swedish dialect -> later tonal peaks older

Question about prosodic timing Bruce (1983): strong Swedish dialect -> later tonal peaks older speakers often sound more dialectal (Stölten & Engstrand, 2003) could prosodic timing be age-related? my hypothesis: older speakers -> later tonal peaks. investigated 1 word ’Nordanvinden’ (4 speakers) 2021 -09 -05 18

F 0 contours for ’nordanvinden’ 6 66 36 91 2021 -09 -05 19

F 0 contours for ’nordanvinden’ 6 66 36 91 2021 -09 -05 19

F 0 contours for ’nordanvinden’ 6 66 H H 36 91 H 2021 -09

F 0 contours for ’nordanvinden’ 6 66 H H 36 91 H 2021 -09 -05 H 20

Questions What have I missed? synchronisation to segments? context effects? …? How do I

Questions What have I missed? synchronisation to segments? context effects? …? How do I proceed? suggestions? Do you (as prosodic experts) believe that there may be a relation between age and prosodic timing? 2021 -09 -05 21

 thank you! questions, answers and comments welcome! 2021 -09 -05 22

thank you! questions, answers and comments welcome! 2021 -09 -05 22