A Tutorial Dialogue System that Adapts to Student

  • Slides: 72
Download presentation
A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research and Development Center

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty (joint

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty (joint work with Kate Forbes-Riley) – Uncertainty Detection and Adaptation – Experimental Evaluation » Wizard-of-Oz » Fully-Automated Summing Up

Tutorial Dialogue Systems Why is one-on-one tutoring so effective? “. . . there is

Tutorial Dialogue Systems Why is one-on-one tutoring so effective? “. . . there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors]. ” [Graesser, Person et al. 2001] Goal: improve Intelligent Tutoring Systems using Natural Language Processing

More generally. . . Natural Language Processing and Tools for Learning

More generally. . . Natural Language Processing and Tools for Learning

More generally. . . Natural Language Processing and Tools for Learning Language (reading, writing,

More generally. . . Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Tutors Scoring

More generally. . . Natural Language Processing and Tools for Learning Language (reading, writing,

More generally. . . Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Tutors Scoring Using Language (to teach everything else) Conversational Tutors / Peers CSCL

More generally. . . Natural Language Processing and Tools for Learning Language (reading, writing,

More generally. . . Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Tutors Scoring Processing Language Using Language (to teach everything else) Conversational Tutors / Peers Readability Questioning & Answering CSCL Discourse Coding Lecture Retrieval

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty –

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation Summing Up

ITSPOKE: Intelligent Tutoring Spoken Dialogue System Back-end is Why 2 -Atlas [Van. Lehn, Jordan,

ITSPOKE: Intelligent Tutoring Spoken Dialogue System Back-end is Why 2 -Atlas [Van. Lehn, Jordan, Rose et al. 2002] Speech – – Enhanced Sphinx 2 speech recognition Cepstral text-to-speech Reimplemented, other changes

10

10

ITSPOKE Corpora Wizard Tutoring (ITSPOKE-WOZ) – 81 students / 405 dialogues – human performs

ITSPOKE Corpora Wizard Tutoring (ITSPOKE-WOZ) – 81 students / 405 dialogues – human performs speech recognition, semantic analysis – computer performs dialogue management Computer Tutoring (ITSPOKE-AUTO) – 72 students / 360 dialogues

Experimental Procedure College students without physics – Read a small background document – Took

Experimental Procedure College students without physics – Read a small background document – Took a multiple-choice Pretest – Worked 5 problems (dialogues) with ITSPOKE – Took an isomorphic Posttest Goal was to optimize Learning Gain – e. g. , Posttest – Pretest

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty –

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation Summing Up

Why Uncertainty? Most frequent student state in our dialogue corpora [Litman and Forbes-Riley 2004]

Why Uncertainty? Most frequent student state in our dialogue corpora [Litman and Forbes-Riley 2004] Focus of other learning sciences, speech and language processing, and psycholinguistic studies [Craig et al. 2004; Liscombe et al. 2005; Pon-Barry et al. 2006; Dijkstra et al. 2006] . 73 Kappa [Forbes-Riley et al. 2008]

Corpus-Based Detection Methodology Learn detection models from training corpora – Use spoken language processing

Corpus-Based Detection Methodology Learn detection models from training corpora – Use spoken language processing to automatically extract features from user turns – Use extracted features (e. g. , prosodic, lexical) to predict uncertainty annotations Evaluate learned models on testing corpora – Significant reduction of error compared to baselines [Litman and Forbes-Riley 2006; Litman et al. 2007]

System Adaptation: How to Respond? Theory-based – [Van. Lehn et al. 2003; Craig et

System Adaptation: How to Respond? Theory-based – [Van. Lehn et al. 2003; Craig et al. 2004] Corpus-based – How do humans respond? e. g. [Forbes-Riley, Rotaru, Litman, and Tetreault 2007] * – What are optimal responses? e. g. [Chi, Van. Lehn and Litman 2010] * * Best paper awards

Theory-Based Adaptation: Uncertainty as Learning Opportunity Uncertainty represents one type of learning impasse, and

Theory-Based Adaptation: Uncertainty as Learning Opportunity Uncertainty represents one type of learning impasse, and is also associated with cognitive disequilibrium – An impasse motivates a student to take an active role in constructing a better understanding of the principle. [Van. Lehn et al. 2003] – A state of failed expectations causing deliberation aimed at restoring equilibrium. [Craig et al. 2004] Hypothesis: The system should adapt to uncertainty in the same way it responds to other impasses (e. g. , incorrectness)

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty –

Outline Motivation The ITSPOKE System and Corpora Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation Summing Up

Adaptation to Student Uncertainty in ITSPOKE Most systems respond only to (in)correctness Literature suggests

Adaptation to Student Uncertainty in ITSPOKE Most systems respond only to (in)correctness Literature suggests uncertain as well as incorrect student answers signal learning impasses Experimentally manipulate tutor responses to student uncertainty, over and above correctness, and investigate impact on learning – Platform: Adaptive version(s) of ITSPOKE

Normal (non-adaptive) ITSPOKE System Initiative Dialogue Format: – Tutor Question – Student Answer –

Normal (non-adaptive) ITSPOKE System Initiative Dialogue Format: – Tutor Question – Student Answer – Tutor Response Types: – to Corrects (C): positive feedback (e. g. “Fine”) – to Incorrects (I): negative feedback (e. g. “Well…”) and » Bottom Out: correct answer with reasoning » Subdialogue: questions walk through reasoning

Adaptive ITSPOKE Our Prior Work: Rank correctness (C, I) + uncertainty (U, non. U)

Adaptive ITSPOKE Our Prior Work: Rank correctness (C, I) + uncertainty (U, non. U) states in terms of impasse severity State: I+non. U I+U C+non. U Severity: most less least none

Adaptive ITSPOKE(s) Our Prior Work: Rank correctness (C, I) + uncertainty (U, non. U)

Adaptive ITSPOKE(s) Our Prior Work: Rank correctness (C, I) + uncertainty (U, non. U) states in terms of impasse severity State: I+non. U I+U C+non. U Severity: most less least none Adaptation Hypothesis: – ITSPOKE already resolves I impasses (I+non. U, I+U), but it ignores one type of U impasse (C+U) – Performance improvement if ITSPOKE provides additional content to resolve all impasses

Two Uncertainty Adaptations Simple Adaptation – Same response for all 3 impasses – Feedback

Two Uncertainty Adaptations Simple Adaptation – Same response for all 3 impasses – Feedback on only (in)correctness Complex Adaptation – Different responses for the 3 impasses – Feedback on both uncertainty and (in)correctness

Simple Adaptation Example: C+U TUTOR 1: By the same reasoning that we used for

Simple Adaptation Example: C+U TUTOR 1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT 1: The force of the car hitting it? ? [C+U] TUTOR 2: Fine. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Same TUTOR 2 subdialogue if student was I+U or I+non. U

Experiment 1: ITSPOKE-WOZ Wizard of Oz version of ITSPOKE – Human recognizes speech, annotates

Experiment 1: ITSPOKE-WOZ Wizard of Oz version of ITSPOKE – Human recognizes speech, annotates correctness and uncertainty – Provides upper-bound language performance Conditions – Simple Adaptation: used same response for all impasses – Complex Adaptation: used different responses for each impasse – Normal Control: used original system (no adaptation) – Random Control: gave Simple Adaptation to random 20% of correct answers (to control for additional tutoring)

Results I: Learning Metric Condition Normal Control Random Control Learning Gain (Posttest – Pretest)

Results I: Learning Metric Condition Normal Control Random Control Learning Gain (Posttest – Pretest) Simple Adaptation Complex Adaptation F(3, 77) = 3. 275, p = 0. 02 N 21 20 20 20 Mean Diff p. 183 < Simple Adaptation. 03. 269. 307. 213 -

Results I: Learning Metric Condition Normal Control Random Control Learning Gain (Posttest – Pretest)

Results I: Learning Metric Condition Normal Control Random Control Learning Gain (Posttest – Pretest) Simple Adaptation Complex Adaptation N 21 20 20 20 Mean Diff p. 183 < Simple Adaptation. 03. 269. 307. 213 - F(3, 77) = 3. 275, p = 0. 02 Simple Adaptation yields more student learning than Normal Control (original ITSPOKE) [Forbes. Riley and Litman 2010]

Results I: Learning Metric Condition Normal Control Random Control Learning Gain (Posttest – Pretest)

Results I: Learning Metric Condition Normal Control Random Control Learning Gain (Posttest – Pretest) Simple Adaptation Complex Adaptation N 21 20 20 20 Mean Diff p. 183 < Simple Adaptation. 03. 269. 307. 213 - F(3, 77) = 3. 275, p = 0. 02 Simple Adaptation yields more student learning than Normal Control (original ITSPOKE) [Forbes. Riley and Litman 2010] Similar results for learning efficiency [Forbes-Riley and Litman 2009]

Additional Evaluations - Metacognition Do metacognitive performance measures differ across experimental conditions? – e.

Additional Evaluations - Metacognition Do metacognitive performance measures differ across experimental conditions? – e. g. , Monitoring Accuracy [Nietfield et al. 2006] Do metacognitive and cognitive performance measures (i. e. learning) correlate?

Metacognitive Results Simple (and random) increased monitoring accuracy compared to normal (p <. 06

Metacognitive Results Simple (and random) increased monitoring accuracy compared to normal (p <. 06 in paired contrasts) Monitoring Accuracy is positively correlated with learning [Litman and Forbes-Riley 2009]

Experiment 2: ITSPOKE-AUTO Fully automated ITSPOKE – Sphinx 2 speech recognizer / Tu. Talk

Experiment 2: ITSPOKE-AUTO Fully automated ITSPOKE – Sphinx 2 speech recognizer / Tu. Talk semantic analyzer » Correctness Accuracy of 85% – Weka uncertainty model » Logistic regression (includes lexical, prosodic, dialogue features) » Uncertainty Accuracy of 80% Only 3 Conditions – Simple Adaptation – Normal Control – Random Control

Preliminary Results: ITSPOKE-AUTO Simple Adaptation yields more student learning than Normal and Random Controls

Preliminary Results: ITSPOKE-AUTO Simple Adaptation yields more student learning than Normal and Random Controls Differences Noisy 3 only significant for a subset of students uncertainty detection is the system bottleneck of the 4 metacognitive metrics remain correlated with learning [Forbes-Riley and Litman, 2010]

Current and Future Research More sophisticated ITSPOKE adaptations – User modeling (domain knowledge, gender)

Current and Future Research More sophisticated ITSPOKE adaptations – User modeling (domain knowledge, gender) – Multiple student states (disengagement) – Motivation [Ward 2010] Remediate metacognition, not just domain content

Summing Up Spoken dialogue contributes to the success of human tutors Using presently available

Summing Up Spoken dialogue contributes to the success of human tutors Using presently available technology, successful tutorial dialogue systems can also be built Adapting to uncertainty can further improve performance – Learning gains, efficiency, metacognition Tutors can serve as platforms for learning science studies

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using Language (to teach everything else) Conversational Tutors Processing Language

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using Language (to teach everything else) Conversational Tutors Tutor Abstraction and Specialization during Reflective Conversation [Katz/Jordan/Litman poster] Processing Language

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors Tutor Abstraction and Specialization Semantic Class during Reflective Conversation Acquisition via Web-Learning [Katz/Jordan/Litman poster] [Lipschultz/Litman poster]

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using

Related Projects Natural Language Processing and Tools for Learning Language (reading, writing, speaking) Using Language (to teach everything else) Computer-Supported Peer Review for Writing [Xiong/Litman/Schunn poster] Processing Language

Acknowledgements ITSPOKE group past and present – Hua Ai, Min Chi, Joanna Drummond, Kate

Acknowledgements ITSPOKE group past and present – Hua Ai, Min Chi, Joanna Drummond, Kate Forbes-Riley, Heather Friedberg, Alison Huettner, Michael Lipschultz, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetreault, Art Ward, Wenting Xiong NLP@Pitt – Jan Wiebe, Rebecca Hwa, Wendy Chapman Why 2 -Atlas and Human Tutoring groups – Kurt Vanlehn, Pamela Jordan, Carolyn Rose – Micki Chi, Scotty Craig, Bob Hausmann, Margueritte Roy, Sandra Katz

Thank You! Questions? Further Information – http: //www. cs. pitt. edu/~litman/itspoke. html

Thank You! Questions? Further Information – http: //www. cs. pitt. edu/~litman/itspoke. html

The End

The End

Example Student States in ITSPOKE: What else do you need to know to find

Example Student States in ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]

WOZ-TUT Screenshot

WOZ-TUT Screenshot

Bigram Dependency Analysis - “Student Certainness – Tutor Positive Feedback” Bigrams χ2 = 225.

Bigram Dependency Analysis - “Student Certainness – Tutor Positive Feedback” Bigrams χ2 = 225. 92 (critical χ2 value at p =. 001 is 16. 27) Tutor Includes. Pos Tutor Omits. Pos neutral 252 2517 neutral 439. 46 2329. 54 certain 273 832 certain 175. 21 928. 79 uncertain 185 631 uncertain 129. 51 686. 49 mixed 71 161 mixed 36. 82 195. 18 OBSERVED EXPECTED Tutor Include. Pos Omits. Pos

Bigram Dependency Analysis (cont. ) - Less Tutor Positive Feedback after Student Neutral turns

Bigram Dependency Analysis (cont. ) - Less Tutor Positive Feedback after Student Neutral turns OBSERVED neutral Includes Omits Pos 252 2517 EXPECTED neutral Includes Omits Pos 439. 46 2329. 54

Bigram Dependency Analysis (cont. ) - Less Tutor Positive Feedback after Student Neutral turns

Bigram Dependency Analysis (cont. ) - Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns Includes Omits Pos neutral 252 2517 certain 273 uncertain mixed OBSERVED Includes Omits Pos neutral 439. 46 2329. 54 832 certain 175. 21 928. 79 185 631 uncertain 129. 51 686. 49 71 161 mixed 36. 82 195. 18 EXPECTED

Survey Tutoring Uncertainty Spoken Dialogue

Survey Tutoring Uncertainty Spoken Dialogue

Learning Efficiency Results Metric Condition Normalized learning Normal Control gain / total tutoring Random

Learning Efficiency Results Metric Condition Normalized learning Normal Control gain / total tutoring Random Control time in minutes Simple Adaptation Complex Adaptation N 21 20 20 20 Mean. 010. 014. 016. 011 Diff < Simple Adapt p. 004 . 013 F(3, 77) = 3. 56, p = 0. 02 Given same amount of tutoring time, Simple Adaptation yields more student learning than either Normal Control or Complex Adaptation Results also hold using raw learning gain, and total number of student turns

Bias Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU • Bias

Bias Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU • Bias scores greater than and less than zero indicate over-confidence and under-confidence, with zero indicating best performance

Discrimination Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU • Discrimination

Discrimination Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU • Discrimination scores greater than zero indicate higher metacognitive performance, in terms of certainty for correct responses and uncertainty for incorrect responses

Results I: Means across Conditions Complex Simple Random Adaptation Control Measure (20) Average Impasse

Results I: Means across Conditions Complex Simple Random Adaptation Control Measure (20) Average Impasse Severity. 59. 60 Monitoring Accuracy. 58. 62 Bias -. 01 -. 03 -. 01 Discrimination. 34. 46. 48 Metacognitive No Normal Control (21). 73. 52 -. 02. 41 statistically significant differences or trends for bias

Results I: Means across Conditions Complex Simple Random Adaptation Control Measure (20) Average Impasse

Results I: Means across Conditions Complex Simple Random Adaptation Control Measure (20) Average Impasse Severity. 59. 60 Monitoring Accuracy. 58. 62 Bias -. 01 -. 03 -. 01 Discrimination. 34. 46. 48 Metacognitive Trend Normal Control (21). 73. 52 -. 02. 41 for discrimination differences overall (p =. 09) However, contrary to our predictions, complex reduced discrimination ability, compared to random and simple (p <. 04 in paired contrasts)

Intelligent Tutoring

Intelligent Tutoring

Corpus-Based Adaptation: How Do Human Tutors Respond? An empirical method for designing dialogue systems

Corpus-Based Adaptation: How Do Human Tutors Respond? An empirical method for designing dialogue systems adaptive to student state – extraction of “dialogue bigrams” from annotated human tutoring corpora – χ2 analysis to identify dependent bigrams – generalizable to any domain with corpora labeled for user state and system response

Example Human Tutoring Excerpt S: T: So the- when you throw it up the

Example Human Tutoring Excerpt S: T: So the- when you throw it up the acceleration will stay the same? [Uncertain] Acceleration uh will always be the same because there isthat is being caused by force of gravity which is not changing. [Restatement, Expansion] mm-k. [Neutral] Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: T: It’s- the direction- it’s downward. [Certain] Yes, it’s vertically down. [Positive Feedback, Restatement]

Findings Statistically significant dependencies exist between students’ state of certainty and the responses of

Findings Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor – After uncertain, tutor Bottoms Out and avoids expansions – After certain, tutor Restates – After any emotion, tutor increases Feedback Dependencies suggest adaptive strategies for implementation in our computer tutor [Forbes-Riley and Litman 2010]

Complex Adaptation Example: C+U TUTOR 2: That’s exactly right, but you seem unsure, so

Complex Adaptation Example: C+U TUTOR 2: That’s exactly right, but you seem unsure, so let’s sum up. [FEEDBACK] The net force on the truck is equal to the impact force on it. We can prove this just like we did for the car. First, we know that gravity and the normal force on the truck must cancel each other, otherwise the truck would not be at rest vertically. Second, we know that the impact force is the only horizontal force exerted on the truck. [LONG BOTTOM OUT]

Complex Adaptation Example: I+U TUTOR 1: By the same reasoning that we used for

Complex Adaptation Example: I+U TUTOR 1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT 1: Zero? ? [I+U] TUTOR 2: That’s not correct, but don’t worry. You seem to be aware of your mistake. Let’s resolve it. [FEEDBACK] The net force on the truck is equal to the impact force on it. Let’s walk through this answer step by step. [SHORT BOTTOM OUT] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE]

Complex Adaptation Example: I+non. U TUTOR 1: By the same reasoning that we used

Complex Adaptation Example: I+non. U TUTOR 1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT 1: The force of gravity. [I+non. U] TUTOR 2: I’m sorry, but I see an error in your answer. Let’s fix it. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE]

Discussion Predictions versus results: - Complex Adaptation > Simple Adaptation > Random Control >

Discussion Predictions versus results: - Complex Adaptation > Simple Adaptation > Random Control > Normal Control Why didn’t Simple Adaptation and Complex Adaptation outperform Random Control? – Random Control adapted to some C+U, diminishing differences – Adapting to C+non. U may increase certainty Why didn’t Complex Adaptation outperform Simple Adaptation? – Complex Adaptation’s human-based content responses were based on frequency, not effectiveness

Complex Adaptation to Uncertainty Depending on if answer is C+U, I+non. U: – ITSPOKE

Complex Adaptation to Uncertainty Depending on if answer is C+U, I+non. U: – ITSPOKE gives same content but varies dialogue act » Based on human tutor responses significantly associated with C+U, I+non. U answers – ITSPOKE gives complex feedback on uncertainty and (in)correctness » Based on empathetic computer tutor literature (Wang et al. , 2005; Hall et al. , 2004; Burleson et al. , 2004)

Impasse Severity Use the scalar value associated with each student turn to compute an

Impasse Severity Use the scalar value associated with each student turn to compute an average impasse severity, per student Nominal State: Scalar State: Severity: I+non. U I+U C+non. U 3 2 1 0 most less least none

Results II Metacognitive Measure (n=81) Average Impasse Severity Monitoring Accuracy R p -. 56

Results II Metacognitive Measure (n=81) Average Impasse Severity Monitoring Accuracy R p -. 56 . 00 . 42 . 00 Correlations of Metacognitive Measures with Posttest, after controlling for Pretest Average Impasse Severity (where smaller is better) is negatively correlated with learning [Litman and Forbes-Riley 2009]

Additional Results II Metacognitive Measure (n=81) Average Impasse Severity Monitoring Accuracy Monitoring R p

Additional Results II Metacognitive Measure (n=81) Average Impasse Severity Monitoring Accuracy Monitoring R p -. 56 . 00 . 42 . 00 Accuracy (where higher is better) is positively correlated with learning [Litman and Forbes-Riley 2009]

Preliminary Results: ITSPOKE-AUTO Metacognitive Measure WOZ AUTO R p Average Impasse Severity -. 56

Preliminary Results: ITSPOKE-AUTO Metacognitive Measure WOZ AUTO R p Average Impasse Severity -. 56 . 00 -. 40 . 00 Monitoring Accuracy . 42 . 00 . 35 . 00 Impasse Severity and Monitoring Accuracy remain correlated with learning in ITSPOKE-AUTO corpus [Forbes-Riley and Litman, submitted]

Monitoring Accuracy Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU •

Monitoring Accuracy Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU • The wizard's annotations for each student are first represented in an array, where each cell represents a mutually exclusive option • motivated by Feeling of (Another’s) Knowing [Smith and Clark 1993; Brennan and Williams 1995] which is closely related to uncertainty [Dijkstra et al. 2006] • The array is then used to compute monitoring accuracy

Monitoring Accuracy Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU •

Monitoring Accuracy Correct Incorrect Non. Uncertain Cnon. U Inon. U Uncertain CU IU • Ranges from -1 (no monitoring accuracy) to 1 (perfect monitoring accuracy)

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) – HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) – We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC)

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) – HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) – We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Denominator sums over all cases

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) – HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) – We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) cases where (un)certainty and (in)correctness agree

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) – HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) – We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) cases where (un)certainty and (in)correctness are at odds

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s

Metacognitive Performance Metrics Knowledge monitoring accuracy (HC) (Nietfeld et al. , 2006) Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) – HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) – We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Scores range from -1 (no accuracy) to 1 (perfect accuracy)