Experimental Research Methods in Language Learning Chapter 12

Leading Questions • What is reliability? • How do we know that a research

The Reliability and Validity of a Measure • Researchers should not assume that their

The Reliability and Validity of a Measure • Test validity was originally defined as

Estimating a Reliability Coefficient • Reliability is a complex issue because there are different

What Does a Reliability Estimate Tell Us?

What Does a Reliability Estimate Tell Us? • Unreliable and invalid: Archery results would

A Reliability Estimate • A reliability estimate ranges between 0 and 1. • A

A Reliability Estimate • A reliability coefficient of 0. 70 upwards (70% or above

Classical True Score Theory • Theoretically speaking, an observed score (e. g. , 5

Standard Error of Measurement (SEM) • A reliability coefficient tells us about score consistency

Standard Error of Measurement (SEM) • If a reliability coefficient of a test is

Standard Error of Measurement (SEM) • SEM = SD X √[1 - a reliability

Factors Influencing a Reliability Coefficient There are interrelated factors that influence the reliability coefficient

Types and Methods of Calculation of Reliability Coefficients • Split-half reliability coefficient • Spearman-Brown

Discussion • What is the difference between a correlation coefficient and a reliability coefficient?

Slides: 16

Download presentation

Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis

Leading Questions • What is reliability? • How do we know that a research instrument is reliable? • Why do you think reliability is important for experimental research?

The Reliability and Validity of a Measure • Researchers should not assume that their instruments are reliable merely because they have already piloted them, or because they adopt them from trusted researchers in the field who originally reported high reliability estimates of the instruments. • A reliability estimate largely depends on the participants taking the tests, and the context in which they take them, and the test items or tasks that have been used.

The Reliability and Validity of a Measure • Test validity was originally defined as the degree to which a measure captures what it claims to measure. • Test validity is related to theory and how a construct is defined. • Reliability is a necessary, but insufficient condition for measurement validity. • Essential to present evidence of a high reliability coefficient, which implies a good level of precision and consistency of the instruments used.

Estimating a Reliability Coefficient • Reliability is a complex issue because there are different aspects we need to take into account. • Reliability needs to be understood together with validity. • Reliability is typically described as the consistency of scoring (e. g. , language tests or productive tasks), coding (e. g. , coding think-aloud or interview data), or rating (e. g. , Likert-scale questionnaires, and quantitative observations).

What Does a Reliability Estimate Tell Us?

What Does a Reliability Estimate Tell Us? • Unreliable and invalid: Archery results would be both unreliable and invalid if our arrows missed the circular target altogether of randomly hit the circular targets without once landing in the goal (the centre). • Reliable but invalid: In this case, arrows hit around or at the same spot in the circular targets, but never hit the goal. • Reliable and valid: This was when arrows hit the goal consistently.

A Reliability Estimate • A reliability estimate ranges between 0 and 1. • A reliability estimate of 0. 90 of a language test indicates that students who score 60 out of 100 are 90% likely to obtain a similar test score when they take a similar test. • A reliability estimate is 0. 50, these same students are only 50% likely to obtain a similar score in a similar test. • 0. 50 suggests a less certainty about the result, compared to 0. 90.

A Reliability Estimate • A reliability coefficient of 0. 70 upwards (70% or above of the items consistently collects information about the target construct) is acceptable, but one of 0. 90 or above is desirable for research (Dörnyei 2007). • A reliability estimate tells us the extent to which a research instrument, an observation, or a coding system is free from error of measurement.

Classical True Score Theory • Theoretically speaking, an observed score (e. g. , 5 out of 10, 70 out of 100) is composed of a true score, which is due to a learner’s true level of ability, and an error score, which is due to factors other than a learner’s level of ability. • Observed score = true score + error score

Standard Error of Measurement (SEM) • A reliability coefficient tells us about score consistency for a group of students. • But it does not directly tell us whether an individual learner’s score is within a reasonable range. • A standard error of measurement (SEM) score tells us a range of possible true scores for a learner. • SEM is related to the reliability coefficient of a research instrument in a specific use.

Standard Error of Measurement (SEM) • If a reliability coefficient of a test is 1. 0, we will know for sure that there is no error score in this test because it has perfect reliability. Given this, the standard error of measurement, by default is zero. • However, a reliability estimate of 1 is very rare and most unlikely. • SEMs are computed using the reliability estimate and the standard deviation of a test score.

Standard Error of Measurement (SEM) • SEM = SD X √[1 - a reliability coefficient, where SD = a standard deviation on the test. • For example, if a test has a reliability coefficient of 0. 82 and SD is 5. 29, we can compute the SEM as follows: • SEM = 5. 29 X √ 1 -0. 82 5. 29 X √ 0. 18 5. 29 X 0. 42 2. 24 • If a participant score was 28 out of 40, we simply use the SEM score to add and subtract the test score (i. e. , 28 ± 2. 24). His/her true score would be within a range of 25. 76 and 30. 24.

Factors Influencing a Reliability Coefficient There are interrelated factors that influence the reliability coefficient of a test or a measure, including. • Objective scoring • Nature of the construct of interest • Number of participants or the sample size • Test length or measure length • Heterogeneity of participants’ abilities and attributes

Types and Methods of Calculation of Reliability Coefficients • Split-half reliability coefficient • Spearman-Brown prophecy coefficient • Cronbach’s Alpha coefficient • Rater Agreements in Percentages • Cohen’s Kappa Coefficient

Discussion • What is the difference between a correlation coefficient and a reliability coefficient? • What is the meaning of a reliability coefficient to you? • What do you think would be a problem in an experimental study when the researchers did not analyze their research instruments prior to inferential statistics?