Test Characteristics Is there a good test Test

























- Slides: 25

Test Characteristics Is there a good test?

Test characteristics 1. 2. 3. 4. 5. Reliability Validity Practicality Authenticity Washback

1. Reliability A. Student related reliability B. Rater reliability C. Test administration D. Test reliability

Definition It refers to the consistency and credibility of measurement; a reliable test gives consistent and stable ‘ reading of a person’s ability from one occasion to the next assuming the person’s ability remains the same’. In other words, reliability can be defined as ‘ the extent to which the test produces consistent scores at different administration to the same group of examinees.

Student related relibility For example a temporary illness, fatigue, a bad day, anxiety ( physical or psychological factors) which may make an observed score deviate from one’s ’true’ score.

Rater reliability Inter-maker reliability: it’s related with objectivity which is low for testing speaking ( the individual subjective tastes, norms and criteria of assessment of two examiners may differ considerably. The examiner’s assessment may be affected by the speaker’s physical appearance or personal preferences and writing), growing fatigue, irritation, etc.

Rater reliability Intra-maker reliability: the place of the copies in the pile; an average composition marked directly after a good one will probably be given a lower mark than if it came after a poor one.

test administration reliability Unreliability may result from the conditions in which the test is administered (eg, an aural comprehension test in which an audio player was used to deliver items for comprehension, but because of street noise outside, students sitting next to open windows could not hear the stimuli accurately.

Test Administration Reliability Other sources: Photocopying variations, the amount of light in different parts of the room, variations in temperature, conditions of desks and chairs.

Test Reliability The nature of the test itself can cause measurement errors as subjective tests ( including rater bias); open-ended responses (eg, essay responses) that require the teacher’s judgment; ambiguous items that have more than one answer; a test that contains too many items beyond what is needed may cause test takers to become fatigued ( respond incorrectly by the last items).

2. Validity Face validity 2. Content validity 1.

Validity It is the most important characteristic of a test. If not valid even a reliable test does not worth much. A valid test is atest that : 1. Measures exactly what it proposes to measure. 2. Does not measure irrelevant variables. 3. Offers useful meaningful information about a test taker’s ability.

Validity In other words, if a test is designed to measure examinee’s language ability it should measure their language ability. Eg, a good test of grammar may be valid for measuring the grammatical ability of the examinnees but

Validity There are many types of validity: Face Validity: it refers to the extent to which the physical appearance of the test corresponds to what is claimed to measure. Eg, a test of grammar must contain grammatical items and not vocabulary items; and vice versa.

Face Validity Teachers can increase students’ fair tests by using: 1. 2. 3. 4. 5. A well-constructed expected format with familiar tasks. Tasks that can be accomplished within an allotted time. Items that are clear and uncomplicated. Tasks that have been rehearsed in their previous course work. Tsks that relate to their course.

Content Validity It refers to the correspondence between the content of the test and the content of the materials to be tested. A test can’t include all the elements of the content to be tested. Nevertheless, the content of the test should be a reasonable sample and representative of the total content to be tested.

What is the importance of content validity? The greater a test’s content validity, the more likely to be an accurate measure of what it’s supposed to measure. 2. Such a test is likely to have a harmful backwash effect. Areas which are not tested are likely to become ignored in teaching and learning. Too often the content of tests is determined by what is easy to test rather what is important to test. The safeguard againt this is to write full test specifications and to ensure tha the test is a fair reflection of these. 1.

3. Practicality Brown (2010) considers the following attributes for practicality: Stays with budgetary limits 2. Can be completed by the test-taker within appropriate time constraints. 3. Has clear directions for administration 4. Appropriately uses available human resources 1.

4. Authenticity Bachman and Palmer (1996: 23) define it as ‘ the degree of correspondence of the characteristics of a given language test task to the features of a target language task’. It may be present in the following ways:

Authenticity Contains language that is as natural as possible 2. Have items that are contextualized rather than isolated 3. Includes meaningful relevant interesting topics 4. Offers tasks that replicate real-world tasks (Brown, 2010) 1.

5. Washback ‘ the effect of testing on teaching and learning’ (Hughes, 2003: 1). And a test that provides beneficial washback:

Washback 1. 2. 3. 4. 5. Positively influences what and how teachers teach/ learners learn Offers learners a chance to adequately prepare Gives learners feedback that enhances their lge development Is more formative in nature than summative Provides conditions for peak performance by the learner.

6. Scorability ‘ can the test be scored with ease so that users may be able to handle it’.

7. Economy Does the test measure what we want to test in a reasonable time considering the test situation?

8. Administrability Can the test be given under the conditions that prevail bt the personnel that is available? Eg, if we do not have a record player, a test can’t be administered.