Cohesion and Learning in a Tutorial Spoken Dialog

  • Slides: 43
Download presentation
Cohesion and Learning in a Tutorial Spoken Dialog System Art Ward Diane Litman 1

Cohesion and Learning in a Tutorial Spoken Dialog System Art Ward Diane Litman 1

Outline n n n Tutoring Goals 4 issues in measuring cohesion n Why they’re

Outline n n n Tutoring Goals 4 issues in measuring cohesion n Why they’re interesting How we test them Results 2

Natural Language Dialog Tutoring n n n Human tutors are better than classroom instruction

Natural Language Dialog Tutoring n n n Human tutors are better than classroom instruction (Bloom 84) Intelligent Tutoring Systems (ITSs) hope to replicate this advantage Is Dialog important to learning? n n n Dialog acts: question answering, explanatory reasoning, deep student answers (Graesser et al. 95, Forbes-Riley et al. 05) Difficult to automatically tag dialog input, so: Automatically detectable dialog features n n Average turn length, etc. (Litman et al. 04) We look at Cohesion n Lexical Co-occurrence between turns 3

Goals and Results n Goals n Want to find if cohesion is correlated with

Goals and Results n Goals n Want to find if cohesion is correlated with learning in our tutoring dialogs. n n Want to find a computationally tractable measure of cohesion n n If it is, may inform ITS design So can be used in a real-time tutor Results n Do find strong correlations with learning n n n For low pre-testers For interactive (tutor to student) measures of cohesion Robust to multiple measures of lexical cohesion 4

4 Issues n n Why/How identify cohesion in dialogs? Do students of different skill

4 Issues n n Why/How identify cohesion in dialogs? Do students of different skill levels respond to cohesion in the same way? (Is there an aptitude/treatment interaction? ) n n Is Interactivity Important? What other processing steps help? 5

Issue 1: How identify cohesion in dialogs? n Why might cohesion be important in

Issue 1: How identify cohesion in dialogs? n Why might cohesion be important in tutoring? n Mc. Namara & Kintsch (96) n Students read high & low coherence text n High coherence text was low coherence version altered to: n n Use consistent referring expressions Identify anaphora Supply background information Interaction between pre-test score & response to textual coherence n n Low pre-testers learned more from more coherent text High pre-testers learned LESS from more coherent text 6

Measuring Cohesion n Measurements from Computational Linguistics n Hearst(94) topic segmentation, text n n

Measuring Cohesion n Measurements from Computational Linguistics n Hearst(94) topic segmentation, text n n Olney & Cai (05) topic segmentation, tutorial dialog n n Thesaurus entries Barzilay & Eldihad (97) Automatic Lexical Chains n n Several measures, including Hearst’s Morris & Hirst (91) Lexical Chains n n Word-count similarity of spans of text Word. Net senses We develop measures similar to Hearst’s n But novel in that: Applied to dialog rather than text, used to find correlations with learning n 7

Issue 1: How identify cohesion in dialogs? Defining Cohesion n Halliday and Hassan (76)

Issue 1: How identify cohesion in dialogs? Defining Cohesion n Halliday and Hassan (76) n n Grammatical vs Lexical Cohesion n n Reiteration n Exact word repetition n Synonym repetition n Near Synonym repetition n Super-ordinate class n General referring noun Cohesion measured by counting “cohesive ties” n Two words joined by a cohesive device (i. e. reiteration) 8

Issue 1: How identify cohesion in dialogs? Defining Cohesion n Halliday and Hassan (76)

Issue 1: How identify cohesion in dialogs? Defining Cohesion n Halliday and Hassan (76) n n Grammatical vs Lexical Cohesion n n Reiteration n Exact word repetition n Synonym repetition n Near Synonym repetition n Super-ordinate class n General referring noun Cohesion measured by counting “cohesive ties” n Two words joined by a cohesive device (i. e. reiteration) 9

Issue 1: How identify cohesion in dialogs? n How we measure Lexical Cohesion n

Issue 1: How identify cohesion in dialogs? n How we measure Lexical Cohesion n We count cohesive ties between turns n Tokens (with stop words) n n Tokens (stop words removed) n n (token = “word”) (Stops = high frequency, low information words) Stems (stop words removed) 10

Stems Stem = non-inflected core of a word n Porter Stemmer n Allows us

Stems Stem = non-inflected core of a word n Porter Stemmer n Allows us to find ties between various inflected forms of the same word in adjacent turns. n “Turns” are tutor and student contributions to Tutoring Dialogs collected by the ITSPOKE group. n 11

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. ITSPOKE Cohesive Ties Token w/stop Token, no stop Stem, no stop Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Matches Count packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14 packet, horizontal, only, force, acting, there, will, still, after 9 packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11 12

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. ITSPOKE Cohesive Ties Token w/stop Token, no stop Stem, no stop Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Matches Count packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14 packet, horizontal, only, force, acting, there, will, still, after 9 packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11 13

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. ITSPOKE Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Cohesive Ties Token w/stop Matches Count packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14 Token, no stop packet, horizontal, only, force, acting, there, will, still, after 9 Stem, no stop packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11 14

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane

Applying Cohesion measures to our Corpora: example Turn Contribution Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. ITSPOKE Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Cohesive Ties Token w/stop Token, no stop Stem, no stop Matches Count packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14 packet, horizontal, only, force, acting, there, will, still, after 9 packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11 15

Issue 2: Is there an aptitude/treatment interaction? n Why there might be: n n

Issue 2: Is there an aptitude/treatment interaction? n Why there might be: n n Mc. Namara & Kintsch How we test it: n Mean pre-test split n n n All students Above-mean pretest students (“high” pre-testers) Below-mean pretest students (“low” pre-testers) 16

Issue 3: Is interactivity Important? n Why it might be: n Chi et al.

Issue 3: Is interactivity Important? n Why it might be: n Chi et al. (01) n n Tutor centered, Student centered, Interactive Deep learning through self construction n n Litman & Forbes-Riley (05) n Learning correlated with both: n n n Not tutor actions alone student utterances that display reasoning tutor questions that require reasoning How we test it: n n n Interactive corpus – compare tutor to student turns Tutor–only corpus Student–only corpus 17

Issue 4: What other processing steps help? n Tried several on training corpus: n

Issue 4: What other processing steps help? n Tried several on training corpus: n n n Removing stop words N-turn spans Selecting “substantive” turns TF-IDF normalization Turn-normalized counts n n Found final options on training corpus: n n n (Raw tie count / # of turns in dialog) One turn spans, turn normalization, no TF-IDF, no substantive turn selection All reported results use these options Tested options on new corpus 18

Where did the corpora come from? n ITSPOKE is a speech-enabled version of Why-2

Where did the corpora come from? n ITSPOKE is a speech-enabled version of Why-2 Atlas (Van. Lehn et al. 02) n n Qualitative physics Tutoring Cycle n n n Student reads instructional materials Takes a pre-test Starts Interactive tutoring cycle n n n Problem Essay Tutor evaluates essay, engages in dialog Revise essay Repeat Takes a post-test 19

Tutoring Corpora n Transcripts of tutoring sessions n Training corpus (fall 2003): n n

Tutoring Corpora n Transcripts of tutoring sessions n Training corpus (fall 2003): n n 20 students, 5 problems each 95 dialogs (5 had no dialog) 13 low pre-testers, 7 high pre-testers Testing corpus (spring 2005): n n n 34 students, 5 problems each 163 dialogs (7 had no dialog) 18 low pre-testers, 16 high pre-testers 20

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Less significant on testing data, “token with stops” level reduced to a trend Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 21

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Slightly less significant on testing data Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 22

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Less significant on testing data, “token with stops” level reduced to a trend Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 23

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Less significant on testing data, “token with stops” level reduced to a trend Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 24

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Less significant on testing data, “token with stops” level reduced to a trend Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 25

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Less significant on testing data, “token with stops” level reduced to a trend Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 26

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling

Results: Aptitude/Treatment n n n Test: partial correlation of post-test & cohesion count, controlling for pre-test Cohesion correlated with learning for low pre-test students Not for high pre-test students Little difference between types of measurement Less significant on testing data, “token with stops” level reduced to a trend Tests Train: 2003 Data Students R P-Value Test: 2005 Data R P-Value Grouped by Token (with stop words) All Students 0. 380 0. 098 0. 207 0. 239 Low Pretest 0. 614 0. 026 0. 448 0. 062 High Pretest 0. 509 0. 244 0. 014 0. 958 Grouped by Token (Stop words removed) All Students 0. 431 0. 058 0. 269 0. 124 Low Pretest 0. 676 0. 011 0. 481 0. 043 High Pretest 0. 606 0. 149 0. 132 0. 627 Grouped by Stem (Stop words removed) All Students 0. 423 0. 063 0. 261 0. 135 Low Pretest 0. 685 0. 010 0. 474 0. 047 High Pretest 0. 633 0. 127 0. 121 0. 655 27

Results: Aptitude/Treatment (2003 data) No significant difference between amounts of (turn normalized) cohesion in

Results: Aptitude/Treatment (2003 data) No significant difference between amounts of (turn normalized) cohesion in high and low pre-test groups. n Difference in correlation between high and low pre-testers not due to different amounts of cohesion. n 28

Results: Interactivity (2003) n Cohesion between tutor utterances is not correlated with learning 29

Results: Interactivity (2003) n Cohesion between tutor utterances is not correlated with learning 29

Results: Interactivity (2003) n No evidence that cohesion between student productions is correlated with

Results: Interactivity (2003) n No evidence that cohesion between student productions is correlated with learning (but student utterances are very short with computer tutor) 30

Discussion n Both high and low pre-testers successfully learned from these dialogs Our measure

Discussion n Both high and low pre-testers successfully learned from these dialogs Our measure of lexical cohesion seems to reflect only what the low pre-testers do to learn, not correlated with what high pretesters do. Mc. Namara & Kintsch also found a positive correlation for low pre-testers, but a negative correlation for high pre-testers. 31

Discussion n Our measures are slightly different: n Mc. Namara & Kintsch: Manipulated coherence

Discussion n Our measures are slightly different: n Mc. Namara & Kintsch: Manipulated coherence in text n n n Reader does not contribute to coherence Coherence is the extent to which semantic relations are spelled out in the text, rather than inferred by the reader. Low pre-testers probably learned because high coherence text allowed them to make inferences they couldn’t from the low cohesion text. n Low pre-testers & low coherence: didn’t know the terms High coherence may allow a greater number of successful inferences for their low pre-testers Our work: Dialog n n Student does contribute to cohesion Higher cohesion means using more of same terms Speculation: High cohesion may indicate the number of successful inferences our low pre-testers already made. High pre-testers already know the terms, so new inferences are not involved in using them. 32

Summary n We have taken automatically computable measures of cohesion from computational linguistics n

Summary n We have taken automatically computable measures of cohesion from computational linguistics n n Applied them to tutorial dialog Found correlations with student learning 33

Conclusions n Simple, automatically computable measures of lexical cohesion correlate with learning n n

Conclusions n Simple, automatically computable measures of lexical cohesion correlate with learning n n n But only for students with low pre-test scores, even though low and high pre-testers showed similar amounts of cohesion. Correlation is robust to differences in type of measurement It’s the cohesion between student and tutor that’s important 34

Future Work n Short term: n Cohesion may also be related with learning in

Future Work n Short term: n Cohesion may also be related with learning in high pretesters, but we’re measuring the wrong kind of cohesion n Work underway to try “sense” level measures n n n New issues: n n Word sense disambiguation (one sense per discourse? ) Or measuring it in the wrong places n n Halliday & Hassan’s “synonym” levels of reiteration “Acceleration” & “speeding up” Try finding cohesion at impasses (Van. Lehn 03) Try finding change in cohesion over time (Pickering & Garrod 04) Is it the dialog, or the essay? Long term: n Test by manipulating cohesion in ITSPOKE 35

Thanks n n Diane Litman ITSPOKE group 36

Thanks n n Diane Litman ITSPOKE group 36

Questions? 37

Questions? 37

38

38

39

39

40

40

41

41

Cohesion vs Coherence n Cohesive Devices n Things that “tie” different parts of a

Cohesion vs Coherence n Cohesive Devices n Things that “tie” different parts of a discourse together: n n But still may not make sense: n n Anaphora, repetition, etc… John hid Bill’s car keys. He likes spinach. (Jurafsky & Martin 00) Coherence relations n Semantic relations between utterances. n Result, Explanation, elaboration, etc. (Hobbs 79) 42

Britton & Gulgoz 91 Original text: Air war in the North, 1965 By the

Britton & Gulgoz 91 Original text: Air war in the North, 1965 By the fall of 1964, Americans in both Saigon and Washington had begun to focus on Hanoi as the source of the continuing problem in the south. n Modified text: Air war in North Vietnam, 1965 By the beginning of 1965, Americans in both Saigon and Washington had begun to focus on Hanoi, capital of North Vietnam, as the source of the continuing problems in the south. n 43