Learning decomposition WARNING Goals Understand what learning decomposition

  • Slides: 32
Download presentation
Learning decomposition WARNING

Learning decomposition WARNING

Goals • Understand what learning decomposition is – And basic intuition • See how

Goals • Understand what learning decomposition is – And basic intuition • See how it was applied to a variety of problems • Think about how to apply it to your data

Introduction to Reading Tutor • More free-form than most of the cognitive tutors •

Introduction to Reading Tutor • More free-form than most of the cognitive tutors • Random interventions • Kids or tutor can initiate help • Turn taking • Never quite sure what student is trying to do

Project LISTEN’s Reading Tutor

Project LISTEN’s Reading Tutor

What is a practice opportunity? (and are they all equally valuable? ) Before story,

What is a practice opportunity? (and are they all equally valuable? ) Before story, tutor teaches ‘elephant’ Student sees word ‘elephant’ in sentence Student clicks for help on it Student reads it ‘Elephant’ occurs twice in the next sentence • How many practice opportunities? • Did instruction have any benefit? • Did seeing word immediately afterwards help?

Procedure • Determine (hopefully motivated) learning decompositions • Find data that reflect learning •

Procedure • Determine (hopefully motivated) learning decompositions • Find data that reflect learning • Solve as a non-linear regression model – Fit model to each student • Interpret model coefficient

Question: Does learner control result in more learning? • In Reading Tutor, students pick

Question: Does learner control result in more learning? • In Reading Tutor, students pick story half the time Tutor picks other half • Tutor selects stories much faster than student • Suspect motivational benefit from learner control (willing to tolerate system over a school year) • Is there a cognitive benefit? • Compare learning of words that occur in studentvs. tutor-selected stories.

Find data that reflect learning • Students perform many actions • Only want those

Find data that reflect learning • Students perform many actions • Only want those that indicate “real” learning to count • Assumptions – First opportunity each day is purest marker • Albert, Ken, and Joe have all observed difficulties with closely space trials – Don’t count stories student has already read • Need outcome measure – Fuse accuracy, speed, and help performance

Approach Asked for Reading Day help? time (Sec) Prior opportunities Student selected Tutor selected

Approach Asked for Reading Day help? time (Sec) Prior opportunities Student selected Tutor selected Outcome 1 1 1 Yes No No 0. 4 0. 5 0 1 2 0 0 0 3. 0 2 No - 3 0 3. 0 2 3 No No 0. 4 0. 3 3 3 1 2 0. 3

Procedure • Determine (hopefully motivated) learning decompositions • Find data that reflect learning •

Procedure • Determine (hopefully motivated) learning decompositions • Find data that reflect learning • Solve as a non-linear regression model – Fit model to each student • Interpret model coefficient (B)

Better Worse Learning curves Performance = Ae-bt Input: number of prior trials (t) Output:

Better Worse Learning curves Performance = Ae-bt Input: number of prior trials (t) Output: expected performance

What if all trials aren’t equal? • Normal model = Ae-bt • Think about

What if all trials aren’t equal? • Normal model = Ae-bt • Think about student vs. tutor chosen story – t 1 = trials where student chose story – t 2 = trials where tutor chose story • Learning decomp model = Ae -b(t 1+B*t 2) – B determines relative efficacy of trials of type t 1 and t 2

Use regression to find relative weight of tutor-selected prior opportunities Asked for Reading Day

Use regression to find relative weight of tutor-selected prior opportunities Asked for Reading Day help? time (Sec) Prior opportunities Student selected Tutor selected Outcome 1 1 1 Yes No No 0. 4 0. 5 0 1 2 0 0 0 3. 0 2 No - 2 0 3. 0 2 3 No No 0. 4 0. 3 2 2 1 2 0. 3

Fit model to each student’s data Student B Chris Smith 0. 3 Pat Johnson

Fit model to each student’s data Student B Chris Smith 0. 3 Pat Johnson 1. 2 Sam Jackson 0. 5 Jessie Stevens 0. 9 Reagan Ronald 0. 7

Interpret B parameter • B is scaling parameter – B>1 students benefit from tutor

Interpret B parameter • B is scaling parameter – B>1 students benefit from tutor control – B 1 no benefit either way – B<1 student control is better • B 0. 8 for tutor-chosen stories (median) – Students learn more from student chosen stories (not my H 0) • What could be other causes of result?

Which students benefit? • Top-down approach: – Think of plausible subgroups – See how/if

Which students benefit? • Top-down approach: – Think of plausible subgroups – See how/if B varies among them • E. g. 1 st grader had 0. 98, 2 nd graders 0. 89, and 3 rd graders 0. 49 – Suggests older kids benefit more (getting pickier? ) • Many possibilities, want to avoid fishing expedition

Which students benefit? Bottom-up approach • Use regression results as training labels for classifier

Which students benefit? Bottom-up approach • Use regression results as training labels for classifier • Predictors: – – Gender Grade Test score (grade normed) Disability status • Boys benefit from learner control Student B Benefits from tutor control? Chris Smith 0. 3 No Pat Johnson 1. 2 Yes Sam Jackson 0. 5 No Jessie Stevens 0. 9 ? Reagan Ronald 0. 7 No

Other learning decompositions: practice effects • Open debate if more learning from rereading stories

Other learning decompositions: practice effects • Open debate if more learning from rereading stories or reading new stories • Generally believed spaced practice better for long term retention (but not short) • Results – Reading new material better than rereading old stories (B = 0. 5) – Later practice opportunities on same day are ineffective (B = 0. 2)

Other learning decompositions: impact of instruction • Reading Tutor has a bunch of random

Other learning decompositions: impact of instruction • Reading Tutor has a bunch of random bits of instruction • Do they do anything? – Solution: model instruction as an encounter and give it a weight • Impact of instruction (in progress) – Spelling intervention worth 0. 75 exposures – Word ID intervention worth 0. 36 exposure – Neither is particularly effective – (but, first analytic approach to find any effect)

Using learning decomposition to model transfer (Xiaonan Zhang) • How do students represent words?

Using learning decomposition to model transfer (Xiaonan Zhang) • How do students represent words? – Naïve model: words are independent – What about “cat” vs “cats”? • Alternate models: – Word roots (cats, cat CAT) – Rimes (bat, cat AT) • T 1 = # prior times have read word • T 2 = # prior times have read root • T 3 = # prior times have read rime • Substantial transfer at level of word root – 55% as good as seeing the word itself

Hopefully • Understand approach – Think of two types of learning that may have

Hopefully • Understand approach – Think of two types of learning that may have unequal impact – Divide up trials – Perform curve fitting • See that it applies to variety of problems • But…

Concerns • We say things like “rereading is not as effective as reading different

Concerns • We say things like “rereading is not as effective as reading different stories” – Is it safe to make causal inference from observational data? • Wide- vs. Re-reading: troublesome – What if lower proficiency is true cause? • Massed vs. Distributed practice: ok (? ) • Student vs. Tutor control: ok • Interventions: ok • What about student initiated help?

Interesting view (Jack Mostow) • Each student has a B parameter • E. g.

Interesting view (Jack Mostow) • Each student has a B parameter • E. g. Chris Smith has B=0. 3 for rereading – Chris Smith learns 30% as much from rereading as wide reading – Impossible for traits of Chris Smith to be a confound (proficiency, disability, etc. ) – But, states could still be a problem • E. g. Chris only rereads after sleeping poorly

Compare LFA and Learning Decomposition • Similar: – Use learning curves and performance data

Compare LFA and Learning Decomposition • Similar: – Use learning curves and performance data – Insight: a model that better predicts student performance is a better model of student’s mental processes (modulo complexity) • Different: – Bottom-up vs. top-down – Each manipulates different aspect of representation

Bottom-up vs. Top-down • Learning decomposition – Start with theory-driven idea – Estimate effect

Bottom-up vs. Top-down • Learning decomposition – Start with theory-driven idea – Estimate effect (if any) – No search • LFA – Start with variety of factors – Perform search – Might not correspond to higher level construct • Not necessarily a bad thing

Consider transfer at level of word roots • Learning decomp: – Student exposure to

Consider transfer at level of word roots • Learning decomp: – Student exposure to words of same root is 55% as good as seeing the word • i. e. cats, cat, cat • i. e. accepts, accept, accept • LFA – Cats and cat are same skill (perfect transfer) • i. e. cats, cat > cat, cat – Accepts and accept are different skills • i. e. accepts, accept < accept, accept

Consider transfer at level of word roots • Learning decomp: – Student exposure to

Consider transfer at level of word roots • Learning decomp: – Student exposure to words of same root is 55% as good as seeing the word • i. e. cats, cat = cat, cat • i. e. accepts, accept = accept, accept • LFA – Cats and cat are same skill (perfect transfer) • i. e. cats, cat > cat, cat – Achieve and achieving are different skills • i. e. accepts, accept < accept, accept

Consider transfer at level of word roots • Learning decomp: – Student exposure to

Consider transfer at level of word roots • Learning decomp: – Student exposure to words of same root is 55% as good as seeing the word • i. e. cats, cat = cat, cat • i. e. accepts, accept = accept, accept • LFA – Cats and cat are same skill (perfect transfer) • i. e. cats, cat > cat, cat – Achieve and achieving are different skills • i. e. accepts, accept < accept, accept

Student learning history Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill

Student learning history Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1 2 Skill 2 0 Skill 3 Skill 1 Skill 2 0 3 1

Learning factors analysis Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill

Learning factors analysis Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1 2 Skill 2 0 Skill 3 Skill 1 Skill 2 0 3 1 Did student utilize skill 1 here? Is it better to think of it as skill 1’?

Learning decomposition Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1

Learning decomposition Skill Prior practice opportunities Skill 1 0 Skill 1 1 Skill 1 2 Skill 2 0 Skill 3 Skill 1 Skill 2 0 3 1 Did the student really have 3 prior practice opportunities? 1+1+1 = 3, but is there a better way of counting?

Wrapup • Why model individual points • Scope of learning decomposition • How learning

Wrapup • Why model individual points • Scope of learning decomposition • How learning decomp differs from LFA