Prosodic Cues to Disengagement and Uncertainty in Physics

  • Slides: 32
Download presentation
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg,

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh, PA USA

2 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting

2 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting Prosodic Features from Affect Annotations • Characterizing and Predicting Affect • Conclusions & Current Directions

3 Background Tutorial dialogue systems for STEM domains may close gap between human and

3 Background Tutorial dialogue systems for STEM domains may close gap between human and computer tutors • Dialogue is a natural and hands-free interaction modality • Only a few computer tutors are dialogue-based (Forbes-Riley & Litman 2011; D’Mello et al. 2010; Pon-Barry et al. 2006) Performance can be further improved by responding to affect • Focus has been on affect common in customer care and information-seeking applications (e. g. , annoyance and frustration (Ang et al. 2002)) • Less research on student affect that occurs in tutoring (e. g. , boredom, confusion, flow (D’Mello et al. 2010))

4 This Paper Speech-based detection of student uncertainty and disengagement in qualitative physics tutorial

4 This Paper Speech-based detection of student uncertainty and disengagement in qualitative physics tutorial dialogue • Both states negative correlate with student learning and user satisfaction (Forbes-Riley & Litman 2012) • Both states are focus of speech and language research (e. g. , Pon. Barry & Shieber 2011; Schuller et al. 2010; Paek & Ju 2008) Compare and contrast the role of prosody in characterization and prediction • UNC (uncertain) versus CER (certain) • DISE (disengaged) versus ENG (engaged)

5 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting

5 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting Prosodic Features from Affect Annotations • Characterizing and Predicting Affect • Conclusions & Current Directions

6 Spoken Dialogue Computer Tutor • ITSPOKE: speech-enhanced and revised version of Why 2

6 Spoken Dialogue Computer Tutor • ITSPOKE: speech-enhanced and revised version of Why 2 -Atlas qualitative physics tutor (Van. Lehn, Jordan, Rose et al. 2002)

7 Spoken Dialogue Computer Tutor • ITSPOKE: speech-enhanced and revised version of Why 2

7 Spoken Dialogue Computer Tutor • ITSPOKE: speech-enhanced and revised version of Why 2 -Atlas qualitative physics tutor (Van. Lehn, Jordan, Rose et al. 2002)

8 Spoken Dialogue Corpus • Collected in user study evaluating utility of detecting and

8 Spoken Dialogue Corpus • Collected in user study evaluating utility of detecting and adapting to uncertainty (Forbes-Riley & Litman 2011) • 7216 turns, 432 dialogues, 72 native English college students with no college physics (6 per student) • Turns labeled by 1 annotator. Agreement studies in ITSPOKE corpora on par with prior work (c. f. , D’Mello et al. , 2008) Turn Label Disengaged (DISE) Uncertain (UNC) Uncertain+Disengaged Total 1170 1483 373 Percent 16% 21% 5% Kappa. 55. 62 --

9 Annotated Dialogue Example ITSPOKE 1: Let’s begin by looking at the motion of

9 Annotated Dialogue Example ITSPOKE 1: Let’s begin by looking at the motion of the man and his keys while he’s holding them. How does his velocity compare to that of his keys? Student 1: same [Disengaged, Certain] … ITSPOKE 12: What are the forces exerted on the man after he releases his keys? Student 12: gravity? ? [Engaged, Uncertain]

10 Annotation Distribution over Time • UNC is highest at beginning of each dialogue

10 Annotation Distribution over Time • UNC is highest at beginning of each dialogue

11 Annotation Distribution over Time • UNC is highest at beginning of each dialogue

11 Annotation Distribution over Time • UNC is highest at beginning of each dialogue • DISE increases as session progresses

12 Observations • Student uncertainty (UNC) and disengagement (DISE) are common in ITSPOKE dialogues

12 Observations • Student uncertainty (UNC) and disengagement (DISE) are common in ITSPOKE dialogues • Different features and models will be needed to best characterize and predict UNC/CER, and DISE/ENG

13 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting

13 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting Prosodic Features from Affect Annotations • Characterizing and Predicting Affect • Conclusions & Current Directions

14 Prosodic Features From the speech file of each student turn: Feature Type Temporal

14 Prosodic Features From the speech file of each student turn: Feature Type Temporal Pitch Energy Features (normalized) turn duration, prior pause duration max, min, mean, std. deviation Experiments with other real-time Open. Smile toolkit features (c. f. Interspeech Paralinguistic Challenge, 2011) have yielded no performance improvements to date

15 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting

15 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting Prosodic Features from Affect Annotations • Characterizing and Predicting Affect • Conclusions & Current Directions

16 Descriptive Analysis Hypothesis: prosodic differences exist between UNC versus CER turns, and between

16 Descriptive Analysis Hypothesis: prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns • For each student: • For each feature: • Calculate mean over UNC, CER, DISE, ENG turns • Paired T-Tests across students for feature means • UNC versus CER • DISE versus ENG

17 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns,

17 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns Temporal Feature Mean Diff ENG - DISE Mean Diff CER - UNC Turn duration Prior pause . 08 -1. 66* -. 03 -3. 08* * • Students take significantly longer to answer when DISE versus ENG, and when UNC versus CER

18 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns,

18 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns Temporal Feature Mean Diff ENG - DISE Mean Diff CER - UNC Turn duration Prior pause . 08 -1. 66* -. 03 -3. 08* Sig. Diff Across Affect * • Difference significantly greater for UNC/CER than DISE/ENG

19 Pitch Descriptive Analysis Pitch Mean Diff Feature ENG - DISE max f 0

19 Pitch Descriptive Analysis Pitch Mean Diff Feature ENG - DISE max f 0 10. 91* min f 0 1. 15 mean f 0 4. 76* std. dev. f 0 2. 89* Mean Diff CER - UNC 9. 97* 1. 25 4. 91* 5. 18* • Students have lower max and mean pitch, and pitch is more constant, when DISE versus ENG, and when UNC versus CER

20 Pitch Descriptive Analysis Pitch Mean Diff Feature ENG - DISE max f 0

20 Pitch Descriptive Analysis Pitch Mean Diff Feature ENG - DISE max f 0 10. 91* min f 0 1. 15 mean f 0 4. 76* std. dev. f 0 2. 89* Mean Diff CER - UNC 9. 97* 1. 25 4. 91* 5. 18* Sig. Diff Across Affect * • Difference in pitch constancy is significantly greater for UNC/CER than DISE/ENG

21 Energy Descriptive Analysis Energy Mean Diff Feature ENG - DISE CER - UNC

21 Energy Descriptive Analysis Energy Mean Diff Feature ENG - DISE CER - UNC max RMS. 005. 011* min RMS <. 001* mean RMS. 001. 002* std. dev. RMS. 001*. 003* • Students have lower min energy, and energy is more constant, when DISE versus ENG, and when UNC versus CER

22 Energy Descriptive Analysis Energy Mean Diff Feature ENG - DISE CER - UNC

22 Energy Descriptive Analysis Energy Mean Diff Feature ENG - DISE CER - UNC max RMS. 005. 011* min RMS <. 001* mean RMS. 001. 002* std. dev. RMS. 001*. 003* • Only UNC turns are softer than CER turns

23 Energy Descriptive Analysis Energy Mean Diff Sig. Diff Feature ENG - DISE CER

23 Energy Descriptive Analysis Energy Mean Diff Sig. Diff Feature ENG - DISE CER - UNC Across Affect max RMS. 005. 011* min RMS <. 001* mean RMS. 001. 002* std. dev. RMS. 001*. 003* * • Difference in energy constancy is greater for UNC/CER

24 Affect Prediction Hypothesis: prosodic features have differing combined utility for predicting UNC/CER and

24 Affect Prediction Hypothesis: prosodic features have differing combined utility for predicting UNC/CER and DISE/ENG turns • Machine learning of manual labels with WEKA software • J 48 decision tree algorithm • Cost matrix penalizes for labeling true DISE/UNC as false • Analyses • Feature usage as % of decisions for which feature queried in learned models • Unweighted avg precision / recall via 10 -fold cross validation

25 Temporal Feature Usage • In both models, temporal features are most highly queried;

25 Temporal Feature Usage • In both models, temporal features are most highly queried; prior pause is the root of both trees • DISE model uses temporal features more heavily than UNC • Only DISE model includes turn duration, which was not discriminative in isolation (for either state) Feature Temporal turn duration prior pause Uncertainty 50% 0% 50% Disengaged 72% 23% 49%

26 Pitch Feature Usage • UNC model uses pitch features more heavily than DISE

26 Pitch Feature Usage • UNC model uses pitch features more heavily than DISE model, and in different relative proportions • Min f 0 isn’t discriminative in isolation for either state, but is included in both predictive models • Std. dev. and mean f 0 Feature are discriminative in Pitch isolation for both states, but are not included in both max f 0 min f 0 predictive models mean f 0 std. dev. f 0 UNC 34% 9% 19% 0% 6% DISE 16% 4% 4% 8% 0%

27 Energy Feature Usage • UNC and DISE models use energy features in different

27 Energy Feature Usage • UNC and DISE models use energy features in different relative proportions • Max RMS is discriminative in isolation for UNC model, but isn’t included in either predictive model • Std. dev. RMS is discriminative in isolation for both states, but is only included in DISE model Feature Energy max RMS min RMS mean RMS std. dev. RMS UNC 15% 0% 6% 9% 0% DISE 11% 0% 1% 4% 6%

28 Quantitative Results • UNC/CER • unweighted avg precision=63%, recall=61% • DISE/ENG • unweighted

28 Quantitative Results • UNC/CER • unweighted avg precision=63%, recall=61% • DISE/ENG • unweighted avg precision=61%, recall=56% • Majority Class Baselines • unweighted avg precision=40%, recall 50% (CER) • unweighed avg precision= 42%, recall=50% (ENG) • NOTE: deployed ITSPOKE model adds non-prosodic features, which further improves performance

29 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting

29 Outline • Background • ITSPOKE: A Spoken Dialogue System for STEM • Extracting Prosodic Features from Affect Annotations • Characterizing and Predicting Affect • Conclusions & Current Directions

30 Conclusions Uncertain and Disengaged turns differ prosodically from Certain or Engaged turns, but

30 Conclusions Uncertain and Disengaged turns differ prosodically from Certain or Engaged turns, but the differences depend somewhat on the affect dimension • Disengaged turns have longer response times, lower pitch values, and less pitch and energy variation than engaged turns • Uncertain turns also are not as loud as certain turns The best combination of prosodic features for affect prediction also depends on the affect dimension • Temporal features are most prominent in both models but DISE model uses them more heavily than UNC, while UNC model uses pitch features more heavily than DISE

31 Current Directions Replicate prosodic analyses on other corpora to explore whether our findings

31 Current Directions Replicate prosodic analyses on other corpora to explore whether our findings generalize to other domains • Level of Interest (Schuller et. al 2010) • Uncertainty (Pon-Barry and Shieber 2011) Implementing best predictive models in ITSPOKE to evaluate the utility of detecting and adapting to uncertainty and disengagement

32 Thank You! Questions? Further Information? www. cs. pitt. edu/~litman/itspoke. html

32 Thank You! Questions? Further Information? www. cs. pitt. edu/~litman/itspoke. html