Alcoholized Speech F 0 and Rhythm Florian Schiel

  • Slides: 30
Download presentation
Alcoholized Speech: F 0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute

Alcoholized Speech: F 0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech Processing Ludwig-Maximilians-Universität München, Germany Special Thanks to: Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, Ra. OLG Tourneur VIU Seminar 14. - 17. April 2009 1

Motivation Overview Motivation, Goals and Earlier Work ALC Corpus F 0 Analysis Rhythm Analysis

Motivation Overview Motivation, Goals and Earlier Work ALC Corpus F 0 Analysis Rhythm Analysis Discussion: Prosodic Features for ALC VIU Seminar 14. - 17. April 2009 2

Motivation Why is alcoholized speech interesting? Phonetic Forensics: Speaker identification from alcoholized speech samples

Motivation Why is alcoholized speech interesting? Phonetic Forensics: Speaker identification from alcoholized speech samples Determine alcoholization from air traffic recordings (for example Exxon Valdez crash in 1987) Traffic accidents: determine alcoholization from in-car recordings, if blood samples are not available VIU Seminar 14. - 17. April 2009 3

Motivation Why is alcoholized speech interesting? Speech Production: How does intoxication influences planing and

Motivation Why is alcoholized speech interesting? Speech Production: How does intoxication influences planing and motor control? Speech Perception: Can listeners judge the alcoholisation from a speech sample? Which features do listeners use for their judgement? VIU Seminar 14. - 17. April 2009 4

Motivation Why is alcoholized speech interesting? Traffic security Can a voice controlled car judge

Motivation Why is alcoholized speech interesting? Traffic security Can a voice controlled car judge the alcoholization of its driver (and then take measures)? } On. Focus / Off. Focus VIU Seminar 14. - 17. April 2009 5

Motivation What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features

Motivation What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features (10) Recognition (2) Common problems: mostly male speakers number of speakers is low (<40), statistics not valid intoxication measured by breath alcohol concentration (BRAC) lab speech ('Northwind and the Sun' etc. ) results partly contradictory VIU Seminar 14. - 17. April 2009 6

Motivation What features have been investigated? F 0 parameters formant Parameters RMS / Loudness

Motivation What features have been investigated? F 0 parameters formant Parameters RMS / Loudness spectral tilt of signal or source signal speech rate parameters pause length, number mispronunciations: deletions, insertions, repairs, stutter errors in phonetic gestures - incomplete gestures (measurement? ) - lateralisation /r/ -> /l/ (measurement? ) - shift of place /s/ -> /S/ or /s/ -> /T/ - nasalisation, de-nasalization (? ) VIU Seminar 14. - 17. April 2009 7

Motivation. . . and what has not been investigated? dysfluencies centralisation of vowels rhythm

Motivation. . . and what has not been investigated? dysfluencies centralisation of vowels rhythm prosodic contours female speech 'outside the lab' speech command & control speech dialogue speech statistically valid data (>100 speakers, > 2 Mio phonemes . . so lets do it! VIU Seminar 14. - 17. April 2009 (Yes, we can!) 8

Motivation Our goals: verify/falsify reported findings on a larger database check for rhythm parameters

Motivation Our goals: verify/falsify reported findings on a larger database check for rhythm parameters check for prosodic contours (with Uwe's help? ) check for centralization of vowels check for 'linguistic irregularities' check for gender / age / speech type influences check on sober control group preception experiments: what features are important? Help wanted! VIU Seminar 14. - 17. April 2009 9

ALC Corpus alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration :

ALC Corpus alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration : 0. 05 – 0. 2% breath sample (BRAC) and blood sample test (BAC) 15 minutes recording in two cars, Speech. Recorder, 2 mics read, monologue, dialogue, command&control (with engine) annotation Speech. Dat extended by Verbmobil tags export into BAS Partitur Format, canonical pronunciation by BALLOON, MAUS segmentation import into Emu hierachy, F 0, formants, RMS analysis using R VIU Seminar 14. - 17. April 2009 10

ALC Corpus VIU Seminar 14. - 17. April 2009 11

ALC Corpus VIU Seminar 14. - 17. April 2009 11

ALC Corpus VIU Seminar 14. - 17. April 2009 12

ALC Corpus VIU Seminar 14. - 17. April 2009 12

ALC Corpus Examples VIU Seminar 14. - 17. April 2009 13

ALC Corpus Examples VIU Seminar 14. - 17. April 2009 13

ALC Corpus Time line and estimates First contact with Legal Medicine DFG application Rhythm

ALC Corpus Time line and estimates First contact with Legal Medicine DFG application Rhythm features Analysis LREC 2008 First recordings Nov 2007 | 14 speakers recorded 61 speakers recorded 2008 First F 0 Analysis First perception tests Analysis of irregularities 82 speakers recorded | 150 speakers recorded 2009 | 75 female + 75 male speakers age 22 – 75 BAC 0. 00 - 0. 20% VIU Seminar 14. - 17. April 2009 14

ALC Corpus Problems 2 nd sober recording : loss rate of 20% MAUS segmentation

ALC Corpus Problems 2 nd sober recording : loss rate of 20% MAUS segmentation of dialogues unreliable solution : pre-segmentation into speaker and nonspeaker parts, then MAUS on each speaker part gender balance: we need more male speakers age balance: very few speakers above 50 VIU Seminar 14. - 17. April 2009 15

Analysis RM-ANOVA requires one measurement per speaker and within-factor combination. between-factors: within-factors: sex, age,

Analysis RM-ANOVA requires one measurement per speaker and within-factor combination. between-factors: within-factors: sex, age, (drinking habits) alc, speech type, (content, car noise) Definition: utterance group (UG) : all utterances of one speaker and one within-factor combination Example: UG(speaker=006, alc=a, type=spont) = 3 monologues, 2 dialogues and 5 spontaneous commands VIU Seminar 14. - 17. April 2009 16

F 0 Analysis F 0 from Vincent-Schaefer pitch period detector (Emu) 1. F 0

F 0 Analysis F 0 from Vincent-Schaefer pitch period detector (Emu) 1. F 0 Median Fm over utterance group (UG) VIU Seminar 14. - 17. April 2009 17

F 0 Analysis 2. F 0 quarter-quantile distances Fqq over UG VIU Seminar 14.

F 0 Analysis 2. F 0 quarter-quantile distances Fqq over UG VIU Seminar 14. - 17. April 2009 18

F 0 Analysis 3. F 0 in lexically accented vowels /a: e: E: i:

F 0 Analysis 3. F 0 in lexically accented vowels /a: e: E: i: u: o: / in same context in read speech 22 female / 24 male speakers Results: Median and quarter-quantile distance of F 0 behave like global values with following exceptions: no significant increase of Fm for male speakers in back vowels /o: / and /u: / no significant increase of Fqq in /a: /, /o: / and /u: / VIU Seminar 14. - 17. April 2009 19

F 0 Analysis 4. F 0 change per speaker read speech Fm(alc) – Fm(non-alc)

F 0 Analysis 4. F 0 change per speaker read speech Fm(alc) – Fm(non-alc) 45 female 37 male VIU Seminar 14. - 17. April 2009 20

F 0 Analysis 5. Hypothesis: F 0 + energy contours differ Example: simple declarative

F 0 Analysis 5. Hypothesis: F 0 + energy contours differ Example: simple declarative sentences with single phrase calculate F 0 by Vincent-Schaefer linear interpolated F 0 gaps calculated 2 nd (tilt) and 3 rd (curvature) coefficients of Discrete Cosine Transform (DCT) VIU Seminar 14. - 17. April 2009 21

F 0 Analysis DCT-0 = 338. 17 DCT-1 = 31. 01 DCT-2 = -3.

F 0 Analysis DCT-0 = 338. 17 DCT-1 = 31. 01 DCT-2 = -3. 92 blue : raw F 0 red : linear interpolation green : DCT coefficients 0 -2 VIU Seminar 14. - 17. April 2009 (bias) (tilt) (curvature) DCT-0 = 313. 62 (bias) DCT-1 = 35. 73 (tilt) DCT-2 = -0. 93 (curvature) 22

F 0 Analysis 2 -dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature) -> centriods

F 0 Analysis 2 -dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature) -> centriods identical, variation increases for alcoholized speech VIU Seminar 14. - 17. April 2009 23

Rhythm in this context: The segmental structure of V, C and P clusters syllable

Rhythm in this context: The segmental structure of V, C and P clusters syllable nuclei = middle of V cluster VIU Seminar 14. - 17. April 2009 24

Rhythm features Two basic types of measurements: counts (normalized over time or on number

Rhythm features Two basic types of measurements: counts (normalized over time or on number of syllables) or proportions, calculated across the UG -> one measurement per UG : <feature> multiple measurements (e. g. per syllable) averaged across UG, usually expressed as mean (. m) and standard deviation (. sd) -> two values per UG : <feature>. m, <feature>. sd Usually the initial and final silence interval of a recording is disregarded. VIU Seminar 14. - 17. April 2009 25

Rhythm feature overview Voicing %V : proportion (time) of voiced signal Speech rate sylrate

Rhythm feature overview Voicing %V : proportion (time) of voiced signal Speech rate sylrate : number of syllables (nuclei) per sec Silence intervals ps-persyl : number of short pauses (<1 sec) per syllable ps-persec : number of short pauses per sec pl-persyl : number of long pauses (>1 sec) per syllable pl-persec : number of long pauses per sec VIU Seminar 14. - 17. April 2009 26

Rhythm feature overview (Cont. ) Silence dimensions durs : length of short pauses (<1

Rhythm feature overview (Cont. ) Silence dimensions durs : length of short pauses (<1 sec) Cluster dimensions delta. V, delta. C (Ramus et al 1999) : voiced and unvoiced cluster lengths delta. SN : nuclei distances Cluster structure n. PVI-V, n. PVI-C (Grabe&Low 2004) : length difference of consecutive clusters normalized to average length of both clusters n. PVI-SN : distance difference of consecutive syllable nuclei normalized to average length of both distances VIU Seminar 14. - 17. April 2009 27

Rhythm Some results RM-ANOVA: Post hoc speech type: Post hoc gender: p = 0.

Rhythm Some results RM-ANOVA: Post hoc speech type: Post hoc gender: p = 0. 0014 (45 female + 37 male, read + command speech) p > 0. 05 p < 0. 001 p > 0. 05 p = 0. 049 - only command - only read no No interaction in gender in all features VIU Seminar 14. - 17. April 2009 28

Conclusion Work in progress, therefore: no conclusions! But. . . VIU Seminar 14. -

Conclusion Work in progress, therefore: no conclusions! But. . . VIU Seminar 14. - 17. April 2009 29

Discussion Prosodic Features for ALC? --Thank you! VIU Seminar 14. - 17. April 2009

Discussion Prosodic Features for ALC? --Thank you! VIU Seminar 14. - 17. April 2009 30