Learning decomposition WARNING Goals Understand what learning decomposition

Goals • Understand what learning decomposition is – And basic intuition • See how

Introduction to Reading Tutor • More free-form than most of the cognitive tutors •

What is a practice opportunity? (and are they all equally valuable? ) Before story,

Procedure • Determine (hopefully motivated) learning decompositions • Find data that reflect learning •

Question: Does learner control result in more learning? • In Reading Tutor, students pick

Find data that reflect learning • Students perform many actions • Only want those

Approach Asked for Reading Day help? time (Sec) Prior opportunities Student selected Tutor selected

Better Worse Learning curves Performance = Ae-bt Input: number of prior trials (t) Output:

What if all trials aren’t equal? • Normal model = Ae-bt • Think about

Use regression to find relative weight of tutor-selected prior opportunities Asked for Reading Day

Fit model to each student’s data Student B Chris Smith 0. 3 Pat Johnson

Interpret B parameter • B is scaling parameter – B>1 students benefit from tutor

Which students benefit? • Top-down approach: – Think of plausible subgroups – See how/if

Which students benefit? Bottom-up approach • Use regression results as training labels for classifier

Other learning decompositions: practice effects • Open debate if more learning from rereading stories

Other learning decompositions: impact of instruction • Reading Tutor has a bunch of random

Using learning decomposition to model transfer (Xiaonan Zhang) • How do students represent words?

Hopefully • Understand approach – Think of two types of learning that may have

Concerns • We say things like “rereading is not as effective as reading different

Interesting view (Jack Mostow) • Each student has a B parameter • E. g.

Compare LFA and Learning Decomposition • Similar: – Use learning curves and performance data

Bottom-up vs. Top-down • Learning decomposition – Start with theory-driven idea – Estimate effect

Consider transfer at level of word roots • Learning decomp: – Student exposure to

Student learning history Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill

Learning factors analysis Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill

Learning decomposition Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1

Wrapup • Why model individual points • Scope of learning decomposition • How learning

Slides: 32

Download presentation

Learning decomposition WARNING

Goals • Understand what learning decomposition is – And basic intuition • See how it was applied to a variety of problems • Think about how to apply it to your data

Introduction to Reading Tutor • More free-form than most of the cognitive tutors • Random interventions • Kids or tutor can initiate help • Turn taking • Never quite sure what student is trying to do

Project LISTEN’s Reading Tutor

What is a practice opportunity? (and are they all equally valuable? ) Before story, tutor teaches ‘elephant’ Student sees word ‘elephant’ in sentence Student clicks for help on it Student reads it ‘Elephant’ occurs twice in the next sentence • How many practice opportunities? • Did instruction have any benefit? • Did seeing word immediately afterwards help?

Procedure • Determine (hopefully motivated) learning decompositions • Find data that reflect learning • Solve as a non-linear regression model – Fit model to each student • Interpret model coefficient

Question: Does learner control result in more learning? • In Reading Tutor, students pick story half the time Tutor picks other half • Tutor selects stories much faster than student • Suspect motivational benefit from learner control (willing to tolerate system over a school year) • Is there a cognitive benefit? • Compare learning of words that occur in studentvs. tutor-selected stories.

Find data that reflect learning • Students perform many actions • Only want those that indicate “real” learning to count • Assumptions – First opportunity each day is purest marker • Albert, Ken, and Joe have all observed difficulties with closely space trials – Don’t count stories student has already read • Need outcome measure – Fuse accuracy, speed, and help performance

Approach Asked for Reading Day help? time (Sec) Prior opportunities Student selected Tutor selected Outcome 1 1 1 Yes No No 0. 4 0. 5 0 1 2 0 0 0 3. 0 2 No - 3 0 3. 0 2 3 No No 0. 4 0. 3 3 3 1 2 0. 3

Better Worse Learning curves Performance = Ae-bt Input: number of prior trials (t) Output: expected performance

What if all trials aren’t equal? • Normal model = Ae-bt • Think about student vs. tutor chosen story – t 1 = trials where student chose story – t 2 = trials where tutor chose story • Learning decomp model = Ae -b(t 1+B*t 2) – B determines relative efficacy of trials of type t 1 and t 2

Use regression to find relative weight of tutor-selected prior opportunities Asked for Reading Day help? time (Sec) Prior opportunities Student selected Tutor selected Outcome 1 1 1 Yes No No 0. 4 0. 5 0 1 2 0 0 0 3. 0 2 No - 2 0 3. 0 2 3 No No 0. 4 0. 3 2 2 1 2 0. 3

Fit model to each student’s data Student B Chris Smith 0. 3 Pat Johnson 1. 2 Sam Jackson 0. 5 Jessie Stevens 0. 9 Reagan Ronald 0. 7

Interpret B parameter • B is scaling parameter – B>1 students benefit from tutor control – B 1 no benefit either way – B<1 student control is better • B 0. 8 for tutor-chosen stories (median) – Students learn more from student chosen stories (not my H 0) • What could be other causes of result?

Which students benefit? • Top-down approach: – Think of plausible subgroups – See how/if B varies among them • E. g. 1 st grader had 0. 98, 2 nd graders 0. 89, and 3 rd graders 0. 49 – Suggests older kids benefit more (getting pickier? ) • Many possibilities, want to avoid fishing expedition

Which students benefit? Bottom-up approach • Use regression results as training labels for classifier • Predictors: – – Gender Grade Test score (grade normed) Disability status • Boys benefit from learner control Student B Benefits from tutor control? Chris Smith 0. 3 No Pat Johnson 1. 2 Yes Sam Jackson 0. 5 No Jessie Stevens 0. 9 ? Reagan Ronald 0. 7 No

Other learning decompositions: practice effects • Open debate if more learning from rereading stories or reading new stories • Generally believed spaced practice better for long term retention (but not short) • Results – Reading new material better than rereading old stories (B = 0. 5) – Later practice opportunities on same day are ineffective (B = 0. 2)

Other learning decompositions: impact of instruction • Reading Tutor has a bunch of random bits of instruction • Do they do anything? – Solution: model instruction as an encounter and give it a weight • Impact of instruction (in progress) – Spelling intervention worth 0. 75 exposures – Word ID intervention worth 0. 36 exposure – Neither is particularly effective – (but, first analytic approach to find any effect)

Using learning decomposition to model transfer (Xiaonan Zhang) • How do students represent words? – Naïve model: words are independent – What about “cat” vs “cats”? • Alternate models: – Word roots (cats, cat CAT) – Rimes (bat, cat AT) • T 1 = # prior times have read word • T 2 = # prior times have read root • T 3 = # prior times have read rime • Substantial transfer at level of word root – 55% as good as seeing the word itself

Hopefully • Understand approach – Think of two types of learning that may have unequal impact – Divide up trials – Perform curve fitting • See that it applies to variety of problems • But…

Concerns • We say things like “rereading is not as effective as reading different stories” – Is it safe to make causal inference from observational data? • Wide- vs. Re-reading: troublesome – What if lower proficiency is true cause? • Massed vs. Distributed practice: ok (? ) • Student vs. Tutor control: ok • Interventions: ok • What about student initiated help?

Interesting view (Jack Mostow) • Each student has a B parameter • E. g. Chris Smith has B=0. 3 for rereading – Chris Smith learns 30% as much from rereading as wide reading – Impossible for traits of Chris Smith to be a confound (proficiency, disability, etc. ) – But, states could still be a problem • E. g. Chris only rereads after sleeping poorly

Compare LFA and Learning Decomposition • Similar: – Use learning curves and performance data – Insight: a model that better predicts student performance is a better model of student’s mental processes (modulo complexity) • Different: – Bottom-up vs. top-down – Each manipulates different aspect of representation

Bottom-up vs. Top-down • Learning decomposition – Start with theory-driven idea – Estimate effect (if any) – No search • LFA – Start with variety of factors – Perform search – Might not correspond to higher level construct • Not necessarily a bad thing

Consider transfer at level of word roots • Learning decomp: – Student exposure to words of same root is 55% as good as seeing the word • i. e. cats, cat, cat • i. e. accepts, accept, accept • LFA – Cats and cat are same skill (perfect transfer) • i. e. cats, cat > cat, cat – Accepts and accept are different skills • i. e. accepts, accept < accept, accept

Consider transfer at level of word roots • Learning decomp: – Student exposure to words of same root is 55% as good as seeing the word • i. e. cats, cat = cat, cat • i. e. accepts, accept = accept, accept • LFA – Cats and cat are same skill (perfect transfer) • i. e. cats, cat > cat, cat – Achieve and achieving are different skills • i. e. accepts, accept < accept, accept

Student learning history Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1 2 Skill 2 0 Skill 3 Skill 1 Skill 2 0 3 1

Learning factors analysis Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1 2 Skill 2 0 Skill 3 Skill 1 Skill 2 0 3 1 Did student utilize skill 1 here? Is it better to think of it as skill 1’?

Learning decomposition Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1 2 Skill 2 0 Skill 3 Skill 1 Skill 2 0 3 1 Did the student really have 3 prior practice opportunities? 1+1+1 = 3, but is there a better way of counting?

Wrapup • Why model individual points • Scope of learning decomposition • How learning decomp differs from LFA