Alcoholized Speech F 0 and Rhythm Florian Schiel

Motivation Overview Motivation, Goals and Earlier Work ALC Corpus F 0 Analysis Rhythm Analysis

Motivation Why is alcoholized speech interesting? Phonetic Forensics: Speaker identification from alcoholized speech samples

Motivation Why is alcoholized speech interesting? Speech Production: How does intoxication influences planing and

Motivation Why is alcoholized speech interesting? Traffic security Can a voice controlled car judge

Motivation What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features

Motivation What features have been investigated? F 0 parameters formant Parameters RMS / Loudness

Motivation. . . and what has not been investigated? dysfluencies centralisation of vowels rhythm

Motivation Our goals: verify/falsify reported findings on a larger database check for rhythm parameters

ALC Corpus alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration :

ALC Corpus VIU Seminar 14. - 17. April 2009 11

ALC Corpus VIU Seminar 14. - 17. April 2009 12

ALC Corpus Examples VIU Seminar 14. - 17. April 2009 13

ALC Corpus Time line and estimates First contact with Legal Medicine DFG application Rhythm

ALC Corpus Problems 2 nd sober recording : loss rate of 20% MAUS segmentation

Analysis RM-ANOVA requires one measurement per speaker and within-factor combination. between-factors: within-factors: sex, age,

F 0 Analysis F 0 from Vincent-Schaefer pitch period detector (Emu) 1. F 0

F 0 Analysis 2. F 0 quarter-quantile distances Fqq over UG VIU Seminar 14.

F 0 Analysis 3. F 0 in lexically accented vowels /a: e: E: i:

F 0 Analysis 4. F 0 change per speaker read speech Fm(alc) – Fm(non-alc)

F 0 Analysis 5. Hypothesis: F 0 + energy contours differ Example: simple declarative

F 0 Analysis DCT-0 = 338. 17 DCT-1 = 31. 01 DCT-2 = -3.

F 0 Analysis 2 -dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature) -> centriods

Rhythm in this context: The segmental structure of V, C and P clusters syllable

Rhythm features Two basic types of measurements: counts (normalized over time or on number

Rhythm feature overview Voicing %V : proportion (time) of voiced signal Speech rate sylrate

Rhythm feature overview (Cont. ) Silence dimensions durs : length of short pauses (<1

Rhythm Some results RM-ANOVA: Post hoc speech type: Post hoc gender: p = 0.

Conclusion Work in progress, therefore: no conclusions! But. . . VIU Seminar 14. -

Discussion Prosodic Features for ALC? --Thank you! VIU Seminar 14. - 17. April 2009

Slides: 30

Download presentation

Alcoholized Speech: F 0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech Processing Ludwig-Maximilians-Universität München, Germany Special Thanks to: Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, Ra. OLG Tourneur VIU Seminar 14. - 17. April 2009 1

Motivation Overview Motivation, Goals and Earlier Work ALC Corpus F 0 Analysis Rhythm Analysis Discussion: Prosodic Features for ALC VIU Seminar 14. - 17. April 2009 2

Motivation Why is alcoholized speech interesting? Phonetic Forensics: Speaker identification from alcoholized speech samples Determine alcoholization from air traffic recordings (for example Exxon Valdez crash in 1987) Traffic accidents: determine alcoholization from in-car recordings, if blood samples are not available VIU Seminar 14. - 17. April 2009 3

Motivation Why is alcoholized speech interesting? Speech Production: How does intoxication influences planing and motor control? Speech Perception: Can listeners judge the alcoholisation from a speech sample? Which features do listeners use for their judgement? VIU Seminar 14. - 17. April 2009 4

Motivation Why is alcoholized speech interesting? Traffic security Can a voice controlled car judge the alcoholization of its driver (and then take measures)? } On. Focus / Off. Focus VIU Seminar 14. - 17. April 2009 5

Motivation What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features (10) Recognition (2) Common problems: mostly male speakers number of speakers is low (<40), statistics not valid intoxication measured by breath alcohol concentration (BRAC) lab speech ('Northwind and the Sun' etc. ) results partly contradictory VIU Seminar 14. - 17. April 2009 6

Motivation What features have been investigated? F 0 parameters formant Parameters RMS / Loudness spectral tilt of signal or source signal speech rate parameters pause length, number mispronunciations: deletions, insertions, repairs, stutter errors in phonetic gestures - incomplete gestures (measurement? ) - lateralisation /r/ -> /l/ (measurement? ) - shift of place /s/ -> /S/ or /s/ -> /T/ - nasalisation, de-nasalization (? ) VIU Seminar 14. - 17. April 2009 7

Motivation. . . and what has not been investigated? dysfluencies centralisation of vowels rhythm prosodic contours female speech 'outside the lab' speech command & control speech dialogue speech statistically valid data (>100 speakers, > 2 Mio phonemes . . so lets do it! VIU Seminar 14. - 17. April 2009 (Yes, we can!) 8

Motivation Our goals: verify/falsify reported findings on a larger database check for rhythm parameters check for prosodic contours (with Uwe's help? ) check for centralization of vowels check for 'linguistic irregularities' check for gender / age / speech type influences check on sober control group preception experiments: what features are important? Help wanted! VIU Seminar 14. - 17. April 2009 9

ALC Corpus alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration : 0. 05 – 0. 2% breath sample (BRAC) and blood sample test (BAC) 15 minutes recording in two cars, Speech. Recorder, 2 mics read, monologue, dialogue, command&control (with engine) annotation Speech. Dat extended by Verbmobil tags export into BAS Partitur Format, canonical pronunciation by BALLOON, MAUS segmentation import into Emu hierachy, F 0, formants, RMS analysis using R VIU Seminar 14. - 17. April 2009 10

ALC Corpus VIU Seminar 14. - 17. April 2009 11

ALC Corpus VIU Seminar 14. - 17. April 2009 12

ALC Corpus Examples VIU Seminar 14. - 17. April 2009 13

ALC Corpus Time line and estimates First contact with Legal Medicine DFG application Rhythm features Analysis LREC 2008 First recordings Nov 2007 | 14 speakers recorded 61 speakers recorded 2008 First F 0 Analysis First perception tests Analysis of irregularities 82 speakers recorded | 150 speakers recorded 2009 | 75 female + 75 male speakers age 22 – 75 BAC 0. 00 - 0. 20% VIU Seminar 14. - 17. April 2009 14

ALC Corpus Problems 2 nd sober recording : loss rate of 20% MAUS segmentation of dialogues unreliable solution : pre-segmentation into speaker and nonspeaker parts, then MAUS on each speaker part gender balance: we need more male speakers age balance: very few speakers above 50 VIU Seminar 14. - 17. April 2009 15

Analysis RM-ANOVA requires one measurement per speaker and within-factor combination. between-factors: within-factors: sex, age, (drinking habits) alc, speech type, (content, car noise) Definition: utterance group (UG) : all utterances of one speaker and one within-factor combination Example: UG(speaker=006, alc=a, type=spont) = 3 monologues, 2 dialogues and 5 spontaneous commands VIU Seminar 14. - 17. April 2009 16

F 0 Analysis F 0 from Vincent-Schaefer pitch period detector (Emu) 1. F 0 Median Fm over utterance group (UG) VIU Seminar 14. - 17. April 2009 17

F 0 Analysis 2. F 0 quarter-quantile distances Fqq over UG VIU Seminar 14. - 17. April 2009 18

F 0 Analysis 3. F 0 in lexically accented vowels /a: e: E: i: u: o: / in same context in read speech 22 female / 24 male speakers Results: Median and quarter-quantile distance of F 0 behave like global values with following exceptions: no significant increase of Fm for male speakers in back vowels /o: / and /u: / no significant increase of Fqq in /a: /, /o: / and /u: / VIU Seminar 14. - 17. April 2009 19

F 0 Analysis 4. F 0 change per speaker read speech Fm(alc) – Fm(non-alc) 45 female 37 male VIU Seminar 14. - 17. April 2009 20

F 0 Analysis 5. Hypothesis: F 0 + energy contours differ Example: simple declarative sentences with single phrase calculate F 0 by Vincent-Schaefer linear interpolated F 0 gaps calculated 2 nd (tilt) and 3 rd (curvature) coefficients of Discrete Cosine Transform (DCT) VIU Seminar 14. - 17. April 2009 21

F 0 Analysis DCT-0 = 338. 17 DCT-1 = 31. 01 DCT-2 = -3. 92 blue : raw F 0 red : linear interpolation green : DCT coefficients 0 -2 VIU Seminar 14. - 17. April 2009 (bias) (tilt) (curvature) DCT-0 = 313. 62 (bias) DCT-1 = 35. 73 (tilt) DCT-2 = -0. 93 (curvature) 22

F 0 Analysis 2 -dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature) -> centriods identical, variation increases for alcoholized speech VIU Seminar 14. - 17. April 2009 23

Rhythm in this context: The segmental structure of V, C and P clusters syllable nuclei = middle of V cluster VIU Seminar 14. - 17. April 2009 24

Rhythm features Two basic types of measurements: counts (normalized over time or on number of syllables) or proportions, calculated across the UG -> one measurement per UG : <feature> multiple measurements (e. g. per syllable) averaged across UG, usually expressed as mean (. m) and standard deviation (. sd) -> two values per UG : <feature>. m, <feature>. sd Usually the initial and final silence interval of a recording is disregarded. VIU Seminar 14. - 17. April 2009 25

Rhythm feature overview Voicing %V : proportion (time) of voiced signal Speech rate sylrate : number of syllables (nuclei) per sec Silence intervals ps-persyl : number of short pauses (<1 sec) per syllable ps-persec : number of short pauses per sec pl-persyl : number of long pauses (>1 sec) per syllable pl-persec : number of long pauses per sec VIU Seminar 14. - 17. April 2009 26

Rhythm feature overview (Cont. ) Silence dimensions durs : length of short pauses (<1 sec) Cluster dimensions delta. V, delta. C (Ramus et al 1999) : voiced and unvoiced cluster lengths delta. SN : nuclei distances Cluster structure n. PVI-V, n. PVI-C (Grabe&Low 2004) : length difference of consecutive clusters normalized to average length of both clusters n. PVI-SN : distance difference of consecutive syllable nuclei normalized to average length of both distances VIU Seminar 14. - 17. April 2009 27

Rhythm Some results RM-ANOVA: Post hoc speech type: Post hoc gender: p = 0. 0014 (45 female + 37 male, read + command speech) p > 0. 05 p < 0. 001 p > 0. 05 p = 0. 049 - only command - only read no No interaction in gender in all features VIU Seminar 14. - 17. April 2009 28

Conclusion Work in progress, therefore: no conclusions! But. . . VIU Seminar 14. - 17. April 2009 29

Discussion Prosodic Features for ALC? --Thank you! VIU Seminar 14. - 17. April 2009 30