TaskIndependent Features for Automated Essay Grading Torsten Zesch
- Slides: 25
Task-Independent Features for Automated Essay Grading Torsten Zesch, Michael Wojatzki Language Technology Lab University of Duisburg-Essen
Essay Writing Widely used to test (high-level) language proficiency § correct word usage § adequate style § structuring capabilities §… Problems - High costs of manual grading - How to individualize? Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 2
Manual Grading Task A Essays Graded Essays B+ Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 3
Automated Grading – Training Task A Graded Essays B+ Grading Model Machine Learning Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 4
Automated Grading – Application Grading Model Task A Essays Graded Essays A- Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 5
Automated Grading – Application Grading Model Task A Essays Graded Essays A- Limitations of current approaches • High costs of building training set (manual grading) Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 6
Automated Grading – Transfer Grading Model Task A Essays Graded Essays A- Task B Essays Graded Essays C Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 7
Automated Grading – Transfer Grading Model Task A Essays Graded Essays A- Limitations of current approaches • Models are usually not transferable between tasks Task B Essays Graded Essays C Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 8
Task-dependence of Features Strongly task-dependent § related to a certain task § For example § The essay contains the words ‘George Washington’ § The essay quotes specific passages from a source. Weakly task-dependent § general properties of good essays § For example § The essay contains connectives like ‘therefore’ or ‘accordingly’ § The essay is free of spelling errors Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 9
Task-dependence of Features weakly task-dependent strongly task-dependent coherence ngrams / topics cohesion length errors comparison to the training set readability similarity to a given source specificity formal referencing style syntactic variation word/sentence length Assumption: Using only weakly task-dependent features improves model transfer Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 10
Feature Sets weakly dependent + strongly dependent = full feature set = reduced feature set Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 11
Automated Grading – Transfer Full Model Task A Essays Reduced Model Graded Essays A- Task B Essays Graded Essays C Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 12
Evaluation English German tasks 8 2 set size ~1800 essays ~200 essays characteristic 4 opinion, 4 source-based both source-based participants 7 th, 8 th, 10 th grade students First year university students Evaluation Metric: Quadratic Weighted Kappa Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 13
Task Types Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A- Sourcebased Task B Essays Opinion Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting Graded Essays C 14
Process Overview Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A- Sourcebased Task B Essays Opinion Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting Graded Essays C 15
Baseline Results – English Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 16
Baseline Results – English Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 17
Baseline Results – German Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 18
Task Types Full Model Sourcebased Task A Essays Reduced Model Graded Essays Opinion A- Sourcebased Task B Essays Opinion Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting Graded Essays C 19
Transfer Loss – Full feature set = small losses (> -0. 3) Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 20
Transfer Loss – Full feature set -0. 46 -0. 39 -0. 53 -0. 29 High losses, except in source/source case Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 21
Transfer Loss – Reduced feature set Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 22
Transfer Loss – Reduced feature set -0. 22 -0. 46 -0. 47 -0. 23 Much smaller losses within task types Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 23
Transfer Loss – German Dataset Same picture on German tasks Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 24
Conclusions & Future Work Model transfer is difficult § Loss is quite dramatic Using only weakly task-dependent features improves transfer (but still significant loss) Future Work § Explore faceted models § Should transfer better than holistic grading § Provides better feedback Department of Computer Science and Applied Cognitive Science | Language Technology Lab | 1 st INDUS Meeting 25
- Torsten zesch
- Automated grading sheet excel
- Automated grading system
- Automated grading software
- Torsten grüttert
- Torsten reil
- Tim kohlmann
- Torsten doenst
- Torsten iversen
- Torsten alt
- Torsten cederlund
- Saskia falke
- 18mobil
- Torsten reil
- Pädagogischer umgang mit traumatisierten kindern
- Torsten kranz
- How to write tok essay
- Staar
- Tok criteria
- Fspos vägledning för kontinuitetshantering
- Typiska drag för en novell
- Nationell inriktning för artificiell intelligens
- Ekologiskt fotavtryck
- Shingelfrisyren
- En lathund för arbete med kontinuitetshantering
- Personalliggare bygg undantag