evaluating a test Test Usefulness Bachman and Palmer
evaluating a test Test Usefulness (Bachman and Palmer, 1996) Anne Mullen anne. mullen@elul. ulaval. ca Université laval 1 october 2014
Test Validity The Progressive Matrix of Validity (Messick, 1989) conceived • to control the quality of the evaluation • to guarantee that the results of the evaluation are precise • to assure that the interpretations of the results are fair 2
Plan 1. Qualities of test usefulness • definitions • questions 2. Creating a valid test 3. Discussion and follow-up questions 3
Six Qualities of Test Usefulness • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 4
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 5
Reliability • seeks to ascertain that the results of an evaluation are similar • measures the coherence of results from one evaluation to another • verifies the variation between results in different evaluations • a minimal level of reliability is determined by the context 6
Is this evaluation reliable? • does the evaluation allow for comparison between test -takers? • does the evaluation allow for comparison with other groups of test-takers in the same session, in different sessions? 7
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 8
Construct Validity • a measurement by which the results of an evaluation can be interpreted as an indicator of the ability that the evaluation is measuring • is said to exist if the results of the evaluation are valid in a specific context and can be generalized (valid in another similar, but different context) 9
Does this evaluation measure the correct construct? • does the evaluation actually evaluate the desired ability? • what other abilities are measured? 10
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 11
Authenticity • the correspondence between the characteristics of the tasks of the context and those of the evaluation • helps in the process of generalization of results 12
Is the evaluation authentic? • will the test-takers need to do similar activities in their present or future, academic or work lives? 13
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 14
Interactiveness • the measure and the type of individual characteristics the test-taker uses when completing the tasks of the evaluation • includes a) the goal b) the specific group being evaluated c) the specific context of the evaluation 15
Is the evaluation interactive? • does the evaluation reflect the classroom activities? • does the evaluation lead the test-taker to use what has been taught and learned? 16
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 17
Impact • the effects of the evaluation on a) society (employers), b) educational systems (administrators, teachers) and c) other stakeholders (parents and test-takers) • the consequences of the evaluation must be evaluated for each stakeholder 18
What is the impact of the evaluation? • how are the results of the test used? • is anyone affected negatively by the evaluation? • who benefits from the evaluation? 19
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 20
Practicality • the measure and the evaluation of the resources: a) human (test correctors, evaluators of the evaluation) b) material (space and equipment) c) time (test creation, the correction, analysis) 21
Is the evaluation practical? • can it be completed in the allotted time? • can it be corrected easily and fairly for all test-takers? • what resources are needed and are they readily available? 22
Determining Test Usefulness Three principles to follow: 1. find a middle ground between the 6 qualities 2. have the six qualities combined and balanced 3. evaluate for the context 23
Six Qualities • Reliability • Construct Validity • Authenticity • Interactiveness • Impact • Practicality 24
Creation of an evaluation • You need to determine an evaluation for the following list of words: to devour, to dirty, to imbibe, to purchase, to relish, to swallow, to savour, to scorch, to slip, to taste, 25
Context • The class is an intermediate 4 -skills ESL class with 23 students. • While listening to a text which included these ten words, take-takers were asked to answer comprehension questions. • The 10 words were listed and defined due to their level of presumed difficulty. • The teacher also orally explained the meaning of these words and answered any questions. 26
Is the text useful? • does the evaluation allow for comparison between test -takers and groups over time? (Reliability) • does the evaluation actually evaluate the desired ability? Do other abilities intervene? (Construct validity) 27
Is the text useful? • does the evaluation reflect the test-taker’s present day or future reality? (Authenticity) • does the evaluation lead the test-taker’s to use what has been taught and learned? (Interactiveness) 28
Is the text useful? • what is the effect of the evaluation? (Impact) • is the evaluation easy to administer? (Practicality) 29
Thank you anne. mullen@elul. ulaval. ca 30
- Slides: 30