A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL

  • Slides: 29
Download presentation
A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL DISCOVERY Nan Li, Noboru Matsuda, William

A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL DISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University

STUDENT MODEL A set of knowledge components (KCs) Encoded in intelligent tutors to model

STUDENT MODEL A set of knowledge components (KCs) Encoded in intelligent tutors to model how students solve problems � Example: What to do next on problems like 3 x=12 A key factor behind instructional decisions in automated tutoring systems 2

STUDENT MODEL CONSTRUCTION Traditional Methods � Structured interviews � Think-aloud protocols � Rational analysis

STUDENT MODEL CONSTRUCTION Traditional Methods � Structured interviews � Think-aloud protocols � Rational analysis Previous Automated Methods � Learning factor analysis (LFA) Require expert input. Highly subjective. Within the search space of human-provided factors. Proposed Approach � Use a machine-learning agent, Sim. Student, to acquire knowledge � 1 production rule acquired => 1 KC in student model (Q matrix) Independent of humanprovided factors. 3

A BRIEF REVIEW OF SIMSTUDENT • A machine-learning agent that • • • acquires

A BRIEF REVIEW OF SIMSTUDENT • A machine-learning agent that • • • acquires production rules from examples & problem solving experience given a set of feature predicates & functions 4

PRODUCTION RULES Skill divide (e. g. -3 x = 6) � What: Left side

PRODUCTION RULES Skill divide (e. g. -3 x = 6) � What: Left side (-3 x) Right side (6) � When: Left side (-3 x) does not have constant term => � How: Get-coefficient (-3) of left side (-3 x) Divide both sides with the coefficient Each production rule is associated with one KC Each step (-3 x = 6) is labeled with one KC, decided by the production applied to that step Original model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e. g. , -x=3 vs -3 x = 6) 5

DEEP FEATURE LEARNING Expert vs Novice (Chi et al. , 1981) � Example: What’s

DEEP FEATURE LEARNING Expert vs Novice (Chi et al. , 1981) � Example: What’s the coefficient of -3 x? Expert uses deep functional features to reply -3 Novice may use shallow perceptual features to reply 3 Model deep feature learning using machine learning techniques Integrate acquired knowledge into Sim. Student learning Remove dependence on strong operators & split KCs into finer grain sizes 6

FEATURE RECOGNITION AS PCFG INDUCTION Underlying structure in the problem Grammar Feature Non-terminal symbol

FEATURE RECOGNITION AS PCFG INDUCTION Underlying structure in the problem Grammar Feature Non-terminal symbol in a grammar rule Feature learning task Grammar induction Student errors Incorrect parsing 7

LEARNING PROBLEM Input is a set of feature recognition records consisting of � An

LEARNING PROBLEM Input is a set of feature recognition records consisting of � An original problem (e. g. -3 x) � The feature to be recognized (e. g. -3 in -3 x) Output �A probabilistic context free grammar (PCFG) � A non-terminal symbol in a grammar rule that represents target feature 8

A TWO-STEP PCFG LEARNING ALGORITHM • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in

A TWO-STEP PCFG LEARNING ALGORITHM • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion Creates non-terminal symbols for frequently occurred sequences E. g. – and 3, Signed. Number and Variable Viterbi Training Phase: Refines rule probabilities Occur more frequently Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990) 9

EXAMPLE OF PRODUCTION RULES BEFORE AND AFTER INTEGRATION Extend the “What” Part in Production

EXAMPLE OF PRODUCTION RULES BEFORE AND AFTER INTEGRATION Extend the “What” Part in Production Rule Original: Extended: Skill divide (e. g. -3 x = 6) What: Left side (-3 x) Left side (-3, -3 x) Right side (6) When: Left side (-3 x) does not operators • Fewer have constant term • Eliminate need for => => domain-specific operators How: Get coefficient (-3) of left side (-3 x) Divide both sides with the coefficient (-3) 10

Original: Skill divide (e. g. -3 x = 6) What: Left side (-3 x)

Original: Skill divide (e. g. -3 x = 6) What: Left side (-3 x) Right side (6) When: Left side (-3 x) does not have constant term => How: Get coefficient (-3) of left side (-3 x) Divide both sides with the coefficient (-3) 11

EXPERIMENT METHOD Sim. Student vs. Human-generated model Code real student data � 71 students

EXPERIMENT METHOD Sim. Student vs. Human-generated model Code real student data � 71 students used a Carnegie Learning Algebra I Tutor on equation solving � Sim. Student: Tutored by a Carnegie Learning Algebra I Tutor Coded each step by the applicable production rule Used human-generated coding in case of no applicable production � Human-generated model: Coded manually based on expertise 12

HUMAN-GENERATED VS SIMSTUDENT KCS Humangenerated Model Sim. Studen Comment t Total # of KCs

HUMAN-GENERATED VS SIMSTUDENT KCS Humangenerated Model Sim. Studen Comment t Total # of KCs 12 21 # of Basic Arithmetic Operation KCs 4 13 Split into finer grain sizes based on different problem forms # of Typein KCs 4 4 Approximately the same # of Other Transformation Operation KCs (e. g. combine like terms) 4 4 Approximately the same 13

HOW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM)

HOW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM) � An instance of logistic regression that � Uses each student, each KC and KC by opportunity interaction as independent variables � To predict probabilities of a student making an error on a specific step 14

AN EXAMPLE OF SPLIT IN DIVISION Human-generated Model � divide: Ax=B & -x=A Sim.

AN EXAMPLE OF SPLIT IN DIVISION Human-generated Model � divide: Ax=B & -x=A Sim. Student � sim. St-divide: Ax=B � sim. St-divide-1: -x=A divide 1 1 1 1 1 sim. St-divide 1 1 1 1 0 0 0 sim. St-divide-1 0 0 0 0 1 1 1 Ax=B -x=A

PRODUCTION RULES FOR DIVISION Skill sim. St-divide (e. g. -3 x = 6) �

PRODUCTION RULES FOR DIVISION Skill sim. St-divide (e. g. -3 x = 6) � What: Skill sim. St-divide-1 (e. g. x = 3) � Left side (-3, -3 x) Right side (6) Left side (-x) Right side (3) � When: � � Left side (-3 x) does not have constant term How: Divide both sides with the coefficient (-3) What: When: � Left side (-x) is of the form -v How: Generate one (1) Divide both sides with -1 16

AN EXAMPLE WITHOUT SPIT IN DIVIDE TYPEIN Humangenerated Model � divide-typein Sim. Student �

AN EXAMPLE WITHOUT SPIT IN DIVIDE TYPEIN Humangenerated Model � divide-typein Sim. Student � sim. St-divide- typein divide-typein 1 1 1 1 1 sim. St-divide-typin 1 1 1 1 17

SIMSTUDENT VS SIMSTUDENT + FEATURE LEARNING Sim. Student � Needs strong operators � Constructs

SIMSTUDENT VS SIMSTUDENT + FEATURE LEARNING Sim. Student � Needs strong operators � Constructs student models similar to human-generated model Extended Sim. Student � Only requires weak operators � Split KCs into finer grain sizes based on different parse trees Does Extended Sim. Student produce a KC model that better fits student learning data? 18

RESULTS Human-generated Model Sim. Student AIC 6529 6448 3 -Fold Cross Validation RMSE 0.

RESULTS Human-generated Model Sim. Student AIC 6529 6448 3 -Fold Cross Validation RMSE 0. 4034 0. 3997 Significance Test Sim. Student outperforms the human-generated model in 4260 out of 6494 steps p < 0. 001 Sim. Student outperforms the human-generated model across 20 runs of cross validation p < 0. 001 19

SUMMARY Presented an innovative application of a machinelearning agent, Sim. Student, for an automatic

SUMMARY Presented an innovative application of a machinelearning agent, Sim. Student, for an automatic discovery of student models. Showed that a Sim. Student generated student model was a better predictor of real student learning behavior than a human-generate model. 20

FUTURE STUDIES Test generality in other datasets in Data. Shop Apply this proposed approach

FUTURE STUDIES Test generality in other datasets in Data. Shop Apply this proposed approach in other domains � Stoichiometry � Fraction addition 21

22

22

AN EXAMPLE IN ALGEBRA 23

AN EXAMPLE IN ALGEBRA 23

FEATURE RECOGNITION AS PCFG INDUCTION Underlying structure in the problem Grammar Feature Non-terminal symbol

FEATURE RECOGNITION AS PCFG INDUCTION Underlying structure in the problem Grammar Feature Non-terminal symbol in a grammar rule Feature learning task Grammar induction Student errors Incorrect parsing 24

LEARNING PROBLEM Input is a set of feature recognition records consisting of � An

LEARNING PROBLEM Input is a set of feature recognition records consisting of � An original problem (e. g. -3 x) � The feature to be recognized (e. g. -3 in -3 x) Output �A probabilistic context free grammar (PCFG) � A non-terminal symbol in a grammar rule that represents target feature 25

A COMPUTATIONAL MODEL OF DEEP FEATURE LEARNING Extended a PCFG Learning Algorithm (Li et

A COMPUTATIONAL MODEL OF DEEP FEATURE LEARNING Extended a PCFG Learning Algorithm (Li et al. , 2009) Feature Learning Stronger Prior Knowledge: � Transfer Learning Using Prior Knowledge 26

A TWO-STEP PCFG LEARNING ALGORITHM • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in

A TWO-STEP PCFG LEARNING ALGORITHM • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion Creates non-terminal symbols for frequently occurred sequences E. g. – and 3, Signed. Number and Variable Viterbi Training Phase: Refines rule probabilities Occur more frequently Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990) 27

FEATURE LEARNING Build most probable parse trees � For all observation sequences Select a

FEATURE LEARNING Build most probable parse trees � For all observation sequences Select a non-terminal symbol that � Matches the most training records as the target feature 28

TRANSFER LEARNING USING PRIOR KNOWLEDGE GSH Phase: � Build parse trees based on some

TRANSFER LEARNING USING PRIOR KNOWLEDGE GSH Phase: � Build parse trees based on some previously acquired grammar rules � Then call the original GSH Viterbi Training: � Add rule frequency in previous task to the current task 0. 5 0. 33 0. 5 0. 66 29