TaskIndependent Features for Automated Essay Grading Torsten Zesch

Essay Writing Widely used to test (high-level) language proficiency § correct word usage §

Manual Grading Task A Essays Graded Essays B+ Department of Computer Science and Applied

Automated Grading – Training Task A Graded Essays B+ Grading Model Machine Learning Department

Automated Grading – Application Grading Model Task A Essays Graded Essays A- Department of

Automated Grading – Application Grading Model Task A Essays Graded Essays A- Limitations of

Automated Grading – Transfer Grading Model Task A Essays Graded Essays A- Task B

Automated Grading – Transfer Grading Model Task A Essays Graded Essays A- Limitations of

Task-dependence of Features Strongly task-dependent § related to a certain task § For example

Task-dependence of Features weakly task-dependent strongly task-dependent coherence ngrams / topics cohesion length errors

Feature Sets weakly dependent + strongly dependent = full feature set = reduced feature

Automated Grading – Transfer Full Model Task A Essays Reduced Model Graded Essays A-

Evaluation English German tasks 8 2 set size ~1800 essays ~200 essays characteristic 4

Task Types Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A-

Process Overview Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A-

Baseline Results – English Department of Computer Science and Applied Cognitive Science | Language

Baseline Results – German Department of Computer Science and Applied Cognitive Science | Language

Transfer Loss – Full feature set = small losses (> -0. 3) Department of

Transfer Loss – Full feature set -0. 46 -0. 39 -0. 53 -0. 29

Transfer Loss – Reduced feature set Department of Computer Science and Applied Cognitive Science

Transfer Loss – Reduced feature set -0. 22 -0. 46 -0. 47 -0. 23

Transfer Loss – German Dataset Same picture on German tasks Department of Computer Science

Conclusions & Future Work Model transfer is difficult § Loss is quite dramatic Using

Slides: 25

Download presentation

Task-Independent Features for Automated Essay Grading Torsten Zesch, Michael Wojatzki Language Technology Lab University of Duisburg-Essen

Essay Writing Widely used to test (high-level) language proficiency § correct word usage § adequate style § structuring capabilities §… Problems - High costs of manual grading - How to individualize? Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 2

Manual Grading Task A Essays Graded Essays B+ Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 3

Automated Grading – Training Task A Graded Essays B+ Grading Model Machine Learning Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 4

Automated Grading – Application Grading Model Task A Essays Graded Essays A- Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 5

Automated Grading – Application Grading Model Task A Essays Graded Essays A- Limitations of current approaches • High costs of building training set (manual grading) Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 6

Automated Grading – Transfer Grading Model Task A Essays Graded Essays A- Task B Essays Graded Essays C Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 7

Automated Grading – Transfer Grading Model Task A Essays Graded Essays A- Limitations of current approaches • Models are usually not transferable between tasks Task B Essays Graded Essays C Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 8

Task-dependence of Features Strongly task-dependent § related to a certain task § For example § The essay contains the words ‘George Washington’ § The essay quotes specific passages from a source. Weakly task-dependent § general properties of good essays § For example § The essay contains connectives like ‘therefore’ or ‘accordingly’ § The essay is free of spelling errors Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 9

Task-dependence of Features weakly task-dependent strongly task-dependent coherence ngrams / topics cohesion length errors comparison to the training set readability similarity to a given source specificity formal referencing style syntactic variation word/sentence length Assumption: Using only weakly task-dependent features improves model transfer Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 10

Feature Sets weakly dependent + strongly dependent = full feature set = reduced feature set Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 11

Automated Grading – Transfer Full Model Task A Essays Reduced Model Graded Essays A- Task B Essays Graded Essays C Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 12

Evaluation English German tasks 8 2 set size ~1800 essays ~200 essays characteristic 4 opinion, 4 source-based both source-based participants 7 th, 8 th, 10 th grade students First year university students Evaluation Metric: Quadratic Weighted Kappa Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 13

Task Types Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A- Sourcebased Task B Essays Opinion Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting Graded Essays C 14

Process Overview Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A- Sourcebased Task B Essays Opinion Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting Graded Essays C 15

Baseline Results – English Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 16

Baseline Results – English Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 17

Baseline Results – German Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 18

Transfer Loss – Full feature set = small losses (> -0. 3) Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 20

Transfer Loss – Full feature set -0. 46 -0. 39 -0. 53 -0. 29 High losses, except in source/source case Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 21

Transfer Loss – Reduced feature set Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 22

Transfer Loss – Reduced feature set -0. 22 -0. 46 -0. 47 -0. 23 Much smaller losses within task types Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 23

Transfer Loss – German Dataset Same picture on German tasks Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 24

Conclusions & Future Work Model transfer is difficult § Loss is quite dramatic Using only weakly task-dependent features improves transfer (but still significant loss) Future Work § Explore faceted models § Should transfer better than holistic grading § Provides better feedback Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 25