Automatic Scoring of Childrens Read Aloud Text Passages

  • Slides: 16
Download presentation
® Automatic Scoring of Children's Read -Aloud Text Passages and Word Lists Klaus Zechner,

® Automatic Scoring of Children's Read -Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service Confidential and Proprietary. Copyright © 2008 Educational

® Motivation (1) • Reading evaluations of middle school population gaining importance • Traditional

® Motivation (1) • Reading evaluations of middle school population gaining importance • Traditional evaluation of reading: off-line, post-hoc answering of questions on passage • Oral reading assessment: on-line, can capture additional dimensions such as fluency, pronunciation etc. • High correlation between oral reading performance measures and traditional reading proficiency measures 2 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Motivation (2) • Goal: read-aloud assessment with fully automatic means • Using automatic

® Motivation (2) • Goal: read-aloud assessment with fully automatic means • Using automatic speech recognition (ASR) • Corpus of text passages and word lists • Correlations between automatic and manual performance measure CWPM (“correct words per minute”) 3 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® CWPM • Main read-aloud proficiency measure in this paper: “correctly read words per

® CWPM • Main read-aloud proficiency measure in this paper: “correctly read words per minute” 4 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Related work • Project LISTEN (Mostow et al. , 1994 ff. ): a

® Related work • Project LISTEN (Mostow et al. , 1994 ff. ): a reading tutor for children that listens • Project TBALL (Alwan, 2007): assessment of children’s language skills 5 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Challenges in Recognizing Children's Speech • Variations in acoustics – Sentence duration decreases

® Challenges in Recognizing Children's Speech • Variations in acoustics – Sentence duration decreases almost linearly between age 7 and 14 – Higher fundamental frequency for children • Variations in syntax – Children tend to ignore sentence boundaries or pause at positions in the text where no pause is warranted Consequently: - Training of specific acoustic and language models 6 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Data sets • Passages: – Training: 600+ passages (3 different texts) – Evaluation:

® Data sets • Passages: – Training: 600+ passages (3 different texts) – Evaluation: 101 passages • Word lists: – Training: 500+ word lists – Evaluation: 42 word lists 7 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Annotation of reading errors • Annotators listen to read-aloud recordings • Enter reading

® Annotation of reading errors • Annotators listen to read-aloud recordings • Enter reading errors into spreadsheet (1 word per line) • Main annotation types: deletions and substitutions of words • Further: insertions of words (not relevant in CWPM formula) 8 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® ASR training and word accuracy 1. Acoustic Model (AM): - combined ETS data

® ASR training and word accuracy 1. Acoustic Model (AM): - combined ETS data with OGI and CMU Kids data 2. Language Model (LM): - Passages: interpolated LM, strongly biased to transcribed passages - Word lists: uniform LM due to difficulty of automatically locating words in signal (noises) Word Accuracy (on unseen test data): - Passages: 72%; Word lists: 50% 9 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Computing CWPM • We don’t know “true” number of deletions and substitutions so

® Computing CWPM • We don’t know “true” number of deletions and substitutions so we estimate them comparing the recognizer’s output with the true reference passage (using NIST’s sclite package) • Pearson Correlations: r=0. 86 (passages), r=0. 80 (word lists; we do not use the reading time here, i. e. “cw” instead of “cwpm”) • Spearman Rank Correlations: r>=0. 7 10 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Cohort prediction experiment • Selected all 27 speakers from passage evaluation set who

® Cohort prediction experiment • Selected all 27 speakers from passage evaluation set who read all 3 passages • Placed them into 3 equal-sized cohorts based on manually determined CWPM measure • Predicted rank of all speakers by ASR and automatic CWPM computation • Result: All 27 speakers placed in correct cohort (within-cohort rankings differ from human rankings) 11 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Children’s typical reading errors • Passages (substitutions): • mostly morphological variants, e. g.

® Children’s typical reading errors • Passages (substitutions): • mostly morphological variants, e. g. , asks ==> ask • wrong determiners or prepositions • Word lists (substitutions): – many part-of-speech errors (e. g. , nature ==> natural, equality ==> equally) 12 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Typical speech recognition errors • Passages (substitutions): • Also some morphological variants, but

® Typical speech recognition errors • Passages (substitutions): • Also some morphological variants, but more closed class word errors (e. g. , determiners, conjunctions, prepositions) • Word lists (substitutions): • Mix of morphological variants, POS-cognates, and sound -related (e. g. , simple ==> example) 13 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Matched errors (S+D) Recall of students’ errors by ASR system Passages 47. 7%

® Matched errors (S+D) Recall of students’ errors by ASR system Passages 47. 7% Word lists 16. 8% 14 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Summary • Showed feasibility of automatically scoring children’s read-aloud speech; word lists harder

® Summary • Showed feasibility of automatically scoring children’s read-aloud speech; word lists harder than passages • High correlation of predicted CWPM with human CWPM (Pearson r>=0. 8. Spearman r>=0. 7) • Very successful cohort assignment experiment • Major ASR problems with word lists due to audio quality issues 15 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

® Future work • Improve accuracy of ASR system (e. g. , more data

® Future work • Improve accuracy of ASR system (e. g. , more data for AM and LM training) • Add new passages and word lists to corpus • Better recording conditions needed, particularly for word lists (e. g. , pre-recording sound calibration, on-line monitoring, storing of time stamps when words are presented on screen) • Investigate using fluency-related and other features from ASR output for improved CWPM prediction 16 Confidential and Proprietary. Copyright © 2008 Educational Testing Service.