Carnegie Mellon Some Useful Design Tactics for Mining

  • Slides: 20
Download presentation
Carnegie Mellon Some Useful Design Tactics for Mining ITS Data Jack Mostow Project LISTEN

Carnegie Mellon Some Useful Design Tactics for Mining ITS Data Jack Mostow Project LISTEN (www. cs. cmu. edu/~listen) Carnegie Mellon University Funding: National Science Foundation ITS 04 Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes, Maceio, Brazil Project LISTEN 1 1

Carnegie Mellon 1. 2. 3. 4. Outline Project LISTEN’s Reading Tutor Modify tutor to

Carnegie Mellon 1. 2. 3. 4. Outline Project LISTEN’s Reading Tutor Modify tutor to get mineable data Map data stream to analyzable data set Mine data set to discover insights Project LISTEN 2 2

Carnegie Mellon Project LISTEN’s Reading Tutor (video) 3 3

Carnegie Mellon Project LISTEN’s Reading Tutor (video) 3 3

Carnegie Mellon Project LISTEN’s Reading Tutor (video) John Rubin (2002). The Sounds of Speech

Carnegie Mellon Project LISTEN’s Reading Tutor (video) John Rubin (2002). The Sounds of Speech (Show 3). On Reading Rockets (Public Television series commissioned by U. S. Department of Education). Washington, DC: WETA. Available at www. cs. cmu. edu/~listen. Project LISTEN 4 4

Carnegie Mellon Thanks to fellow LISTENers Tutoring: § § § Field staff: Dr. Joseph

Carnegie Mellon Thanks to fellow LISTENers Tutoring: § § § Field staff: Dr. Joseph Beck, mining tutorial data Prof. Albert Corbett, cognitive tutors Prof. Rollanda O’Connor, reading Prof. Kathy Ayres, stories for children Joe Valeri, activities and interventions Becky Kennedy, linguist § § Dr. Mosur Ravishankar, recognizer Dr. Evandro Gouvea, acoustic training John Helman, transcriber Programmers: § § Project LISTEN Andrew Cuneo, application Karen Wong, Teacher Tool § § Dr. Roy Taylor Kristin Bagwell Julie Sleasman Grad students: § § § Listening: § § § Hao Cen, HCI Cecily Heiner, MCALL Peter Kant, Education Shanna Tellerman, ETC Plus: § § Advisory board Research partners § § § 5 § De. Paul UBC U. Toronto Schools 5

Carnegie Mellon Project LISTEN’s Reading Tutor: A rich source of experimental data 2003 -2004

Carnegie Mellon Project LISTEN’s Reading Tutor: A rich source of experimental data 2003 -2004 database: § § § 9 schools > 200 computers > 50, 000 sessions > 1. 5 M tutor responses > 10 M words recognized Embedded experiments § Project LISTEN 6 Randomized trials 6

Carnegie Mellon Modify tutor to get mineable data Log operations at grain size and

Carnegie Mellon Modify tutor to get mineable data Log operations at grain size and level of interest § § Click <x, y> at time t: motor control Click “Goldilocks”: item selection Reify operations to log them analyzably § § Handwriting or speech typed input Freehand drawing graphical palette (Geometry Tutor) Free-form responses menu selection (Self 88) Natural language sentence starters (Goodman 03) Time student and tutor actions § § § Time allocation reflects motivation (ITS 02) Hasty responses indicate guessing (TICL 04) Latency reflects automaticity (TICL 04) Project LISTEN 7 7

Carnegie Mellon Modify tutor: add relevant data Randomize tutorial decisions § What skill to

Carnegie Mellon Modify tutor: add relevant data Randomize tutorial decisions § What skill to test, what help to give Probe skills § § § Assess cognitive development (Arroyo 00) Test vocabulary words (IJAIE 01) Insert automated comprehension questions (TICL 04) Import student data § § § Gender, age, IQ (Shute 96) Prior knowledge (Corbett 00) Pretest scores (TICL 04) Hand-label when appropriate § Transcribe (some) spoken input (FLET 04) Project LISTEN 8 8

Carnegie Mellon Modify tutor: an example Randomize: explain some new words but not others.

Carnegie Mellon Modify tutor: an example Randomize: explain some new words but not others. Probe: test each new word the next day. Did kids do better on explained vs. unexplained words? § § Overall: NO; 38% 36%, N = 3, 171 trials (IJAIE 01). Rare, 1 -sense words tested 1 -2 days later: YES! 44% >> 26%, N = 189. Project LISTEN 9 9

Carnegie Mellon § § Map data stream to data set: structure data into a

Carnegie Mellon § § Map data stream to data set: structure data into a single type Data stream: heterogeneous events over time Data set: elements with the same features Segment into shorter episodes § Tutorial action(s) + student response (Beck 00) Slice into narrower strands § § Successive encounters of a specific word (AMLDP 98) Successive instances of a specific skill (learning curves) Measure aggregated events § Allocation of time among activities (ITS 02) Formulate data as experimental trials Context where the trial occurred § Decision made in this trial § Outcome based on subsequent events Project LISTEN 10 § 10

Map data stream to data set: Carnegie Mellon Formulate data as experimental trials Data

Map data stream to data set: Carnegie Mellon Formulate data as experimental trials Data stream: Context: Student is reading a story ‘People sit down and …’ Student needs help on a word Student clicks ‘read. ’ Tutor chooses what help to give Decision (randomized) Student continues reading ‘… read a book. ’ Time passes… Student sees word in a later sentence Project LISTEN 11 ‘I love to read stories. ’ Outcome: read fluently? 11

Carnegie Mellon Map data stream to data set: trials Context: Project LISTEN Decision: 12

Carnegie Mellon Map data stream to data set: trials Context: Project LISTEN Decision: 12 Outcome: 12

Carnegie Mellon Mine data set to make discoveries Count outcome frequency § Success rate

Carnegie Mellon Mine data set to make discoveries Count outcome frequency § Success rate of each help type (ICALL 04) Fit a parametric model § Knowledge tracing (Corbett 95) Train a model § § Statistics, e. g. regression (TICL 04) Machine learning, e. g. decision trees (AIED 01) Project LISTEN 13 13

Count outcome frequency: which help types worked best? Carnegie Mellon § § Best: Rhymes

Count outcome frequency: which help types worked best? Carnegie Mellon § § Best: Rhymes With 69. 2% ± 0. 4% Worst: Recue 55. 6% ± 0. 4% Compare within level to control for word difficulty. Same day: Later day: Grade 1 words: Say In Context, Onset Rime Grade 2 words: Say In Context, Rhymes With Grade 3 words: Say In Context Rhymes With, One Grapheme Supplying the word helped best in the short term… But rhyming hints had longer lasting benefits. Project LISTEN 14 14

Carnegie Mellon Summary: modify, map, mine. Modify tutor to make data mineable. 1. §

Carnegie Mellon Summary: modify, map, mine. Modify tutor to make data mineable. 1. § Log, reify, time, hand-label, import, probe, randomize. Map data streams to data sets. 2. § Segment, slice, measure. Mine data set to make discoveries. 3. § Count, fit, train. See videos, papers, etc. at www. cs. cmu. edu/~listen. Thank you! Questions? Project LISTEN 15 15

Carnegie Mellon Modify tutor to get mineable data word features Project LISTEN 16 16

Carnegie Mellon Modify tutor to get mineable data word features Project LISTEN 16 16

Carnegie Mellon Structure of Reading Tutor database Reading Tutor List readers List stories Show

Carnegie Mellon Structure of Reading Tutor database Reading Tutor List readers List stories Show one sentence at a time Listens and helps Project LISTEN Student Session Story Encounter Sentence Encounter Word Encounter 17 Login Pick stories Read sentence Read each word 17

Map data stream to data set: formulate data as experimental trials Carnegie Mellon §

Map data stream to data set: formulate data as experimental trials Carnegie Mellon § § § Context where the trial occurred Decision made in this trial Outcome based on subsequent events Context Student is stuck Before a new word Click on word Project LISTEN Decision Prompt or cough? Explain it or not? What help to give? Outcome Next event in FF 2000 dialog Test word next IJAIE 01 day Word read OK SSSR 04 next time? 18 18

Carnegie Mellon Learning curves for students’ help requests Try to predict subset § §

Carnegie Mellon Learning curves for students’ help requests Try to predict subset § § Selected data Grade 1 -2 level 1 -6 prior encounters § § § 53 students 175, 961 words 29, 278 help requests Train predictive model § § § Project LISTEN 19 Count help requests 5 x Predict other kids’ data 71% accuracy 19

Carnegie Mellon Count outcome frequency (average success rate 66. 1%) Example: ‘People sit down

Carnegie Mellon Count outcome frequency (average success rate 66. 1%) Example: ‘People sit down and read a book. ’ Whole word: Analogy: § § 24, 841 Say In Context 56, 791 Say Word Decomposition: § § 6, 280 Syllabify 14, 223 Onset Rime 19, 677 Sound Out 22, 933 One Grapheme § § Semantic: § § § 14, 685 Recue 2, 285 Show Picture 488 Sound Effect Which types stood out? § § Project LISTEN 13, 165 Rhymes With 13, 671 Starts Like Best: Rhymes With 69. 2% ± 0. 4% Worst: Recue 55. 6% ± 0. 4% 20 20