Introductory Comments on Test Validity Using statistics in

What is test validity? Tests and other data collection tools must measure accurately and

How is test validity relevant to research study validity? Test validity has an impact

Perspectives on validity v There are different perspectives and techniques associated with investigations of

Validity perspectives Construct validity Content validity Criterion-related validities ◦ Concurrent validity ◦ Predictive validity

Considering construct validation Construct validity—the extent to which the constructs measured by a test

Considering content validation Content validity—the items or tasks measure the constructs completely and without

Criterion-related approaches to investigating validity There are two criterion-related approaches to investigating validity. Both

The concurrent validity approach… The new tool is administered to a group of people—

Concurrent validity example… Does a new test of Business English ability really measure that

Predictive validity approach… Admissions tests must have good predictive validity. Ways to collect evidence

Predictive validity example… In the past, all students in a particular MATESOL/TFL had to

Face validity is the extent to which research study participants and other users of

Collecting validity evidence Correlational evidence ◦ Two tests (concurrent validity) ◦ A test and

Collecting validity evidence, continued… Experimental evidence ◦ Intervention study ◦ Differential group study ©

Collecting validity evidence, continued… Expert review of content, format, processes. ◦ ◦ Language testing

Slides: 16

Download presentation

Introductory Comments on Test Validity Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014

What is test validity? Tests and other data collection tools must measure accurately and appropriately given the nature of the construct. Test validity is associated with the extent to which: ◦ a tool measures the intended construct ◦ the tool scores/outcomes mean what they are intended to mean ◦ the tool scores/outcomes are useful for their intended purpose(s) © Taylor & Francis 2014

How is test validity relevant to research study validity? Test validity has an impact on both internal research study validity and external research study validity. © Taylor & Francis 2014

Perspectives on validity v There are different perspectives and techniques associated with investigations of test validity. Historically, these different perspectives and techniques were referred to as different types of validity. (Though they aren’t really different types. ) © Taylor & Francis 2014

Validity perspectives Construct validity Content validity Criterion-related validities ◦ Concurrent validity ◦ Predictive validity Face validity © Taylor & Francis 2014

Considering construct validation Construct validity—the extent to which the constructs measured by a test or data collection tools are clearly and appropriately defined and measured ◦ (1) Are the definitions of the constructs clear and useful? ◦ (2) Does the data collection tool really tap these skills? ◦ (3) Is there convincing evidence supporting points 1 and 2? © Taylor & Francis 2014

Considering content validation Content validity—the items or tasks measure the constructs completely and without measuring other, unrelated knowledge, skills, or abilities. ◦ Does the test measure all aspects of the construct? ◦ Is there very little measured by that test that’s unrelated to the construct? © Taylor & Francis 2014

Criterion-related approaches to investigating validity There are two criterion-related approaches to investigating validity. Both involve investigating the relationship between the data collection tool in question and another tool. ◦ Concurrent validity ◦ Predictive validity © Taylor & Francis 2014

The concurrent validity approach… The new tool is administered to a group of people— who also completed a well-established tool tapping the same construct. If the new tool taps what it’s designed to measure, the correlation between the two sets of scores will be high. If the correlation is high, the concurrent validity is considered good—evidence that the test measures the intended construct. © Taylor & Francis 2014

Concurrent validity example… Does a new test of Business English ability really measure that construct? ◦ Give the new test to a large number of examinees; also give the same examinees the English BULATS test (a recognized measure of Business English ability). ◦ Calculate the correlation between scores on the two tests. A high correlation serves as evidence that the new test measures Business English, because it relates well to the recognized measure of Business English. This approach is called concurrent validity because the two tests are taken concurrently. This approach is only as useful as the comparison measure is sound! © Taylor & Francis 2014

Predictive validity approach… Admissions tests must have good predictive validity. Ways to collect evidence of predictive validity: ◦ Give the test to a number of people starting a program of study. ◦ At the end of the term, collect information on their final exam or final GPA. ◦ Find the correlation between the initial scores and the later measure of success. ◦ A high correlation is evidence of high predictive validity. © Taylor & Francis 2014

Predictive validity example… In the past, all students in a particular MATESOL/TFL had to take the GRE (though it wasn’t used for admission). The correlation between GRE performance and students’ score on their comprehensive examination at the end of their studies was found to be very low. The GRE doesn't seem to have good predictive validity for students in this program. © Taylor & Francis 2014

Face validity is the extent to which research study participants and other users of a data collection tool outcome believe the tool is useful and the outcomes are good indicators of the intended construct. A data collection tool’s face validity varies according to individuals’ background and experiences, thus it’s impressionistic. Though impressionistic, it’s important because participant performance may be affected by face validity! © Taylor & Francis 2014

Collecting validity evidence Correlational evidence ◦ Two tests (concurrent validity) ◦ A test and a future measure (predictive validity) © Taylor & Francis 2014

Collecting validity evidence, continued… Experimental evidence ◦ Intervention study ◦ Differential group study © Taylor & Francis 2014

Collecting validity evidence, continued… Expert review of content, format, processes. ◦ ◦ Language testing experts Teachers Employers Learners © Taylor & Francis 2014