- Slides: 14
Reliability and Validity
Reliability & Validity Reliability refers to the degree to which a test is consistent and stable in measuring what it is intended to measure. Validity refers to how well a test measures what it is purported to measure in a particular context. Test validity is also the extent to which inferences, conclusions, and decisions made on the basis of test scores are appropriate and meaningful.
Types of Reliability Inter-Rater or Inter-Observer Reliability: Used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon. Test-Retest Reliability: Used to assess the consistency of a measure from one time to another. Parallel-Forms Reliability: Used to assess the consistency of the results of two tests constructed in the same way from the same content domain. Internal Consistency Reliability: Used to assess the consistency of results across items within a test.
Types of Internal Consistency Reliability • • Average Inter-Item Correlation Average item-total correlation Split – half reliability Cronbach Alpha test of reliability
Types of Validity 1. Experimental Validity – Internal validity – External validity 2. Content Validity – Face validity 3. Criterion Validity – Concurrent validity – Predictive validity 4. Construct Validity – Convergent validity – Discriminant validity – Nomological validity
Experimental Validity External validity is the extent to which the results of a study can be generalized to other situations and to other people. There are two kinds of generalizability at issue: 1. The extent to which we can generalize from the situation constructed by an experimenter to reallife situations (generalizability across situations) 2. The extent to which we can generalize from the people who participated in the experiment to people in general (generalizability across people)
Experimental Validity Internal validity is a property of scientific studies which reflects the extent to which a causal conclusion based on a study is warranted. Such warrant is constituted by the extent to which a study minimizes systematic error (or 'bias'). A causal inference may be based on a relation when three criteria are satisfied: • the "cause" precedes the "effect" in time (temporal precedence), • the "cause" and the "effect" are related (covariation), and • there are no plausible alternative explanations for the observed covariation (nonspuriousness).
Content Validity Content validity is a non-statistical type of validity that involves "the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured. Content validity evidence involves the degree to which the content of the test matches a content domain associated with the construct. For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature?
Content Validity Face validity is an estimate of whether a test appears to measure a certain criterion; it does not guarantee that the test actually measures phenomena in that domain. Measures may have high validity, but when the test does not appear to be measuring what it is, it has low face validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion, face validity relates to whether a test appears to be a good measure or not.
Criterion Validity Criterion validity evidence involves the correlation between the test and a criterion variable (or variables) taken as representative of the construct. In other words, it compares the test with other measures or outcomes (the criteria) already held to be valid. For example, employee selection tests are often validated against measures of job performance (the criterion), and IQ tests are often validated against measures of academic performance (the criterion).
Criterion Validity Concurrent validity is a measure of how well a particular test correlates with a previously validated measure. Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. Predictive validity refers to the degree to which the operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future. If the test data and criterion data are collected at the same time, this is referred to as concurrent validity evidence. If the test data are collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.
Construct Validity Construct validity refers to the extent to which operationalizations of a construct (i. e. , practical tests developed from a theory) do actually measure what theory says they do. For example, to what extent is an IQ questionnaire actually measuring "intelligence"? Construct validity evidence involves the empirical and theoretical support for the interpretation of the construct.
Construct Validity Convergent validity refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with. Discriminant validity tests whether concepts or measurements that are supposed to be unrelated are, in fact, unrelated. Nomological validity is based on investigation of constructs and measures in terms of formal hypothesis derived from theory