Reliability or Validity Reliability gets more attention n

  • Slides: 34
Download presentation
Reliability or Validity Reliability gets more attention: n Easier to understand n Easier to

Reliability or Validity Reliability gets more attention: n Easier to understand n Easier to measure n More formulas (like stats!) n Base for validity

Need for validity Does test measure what it claims? n Can test be used

Need for validity Does test measure what it claims? n Can test be used to make decisions? n

Validity Reliability is a necessary, but not a sufficient condition for validity.

Validity Reliability is a necessary, but not a sufficient condition for validity.

Validity: a definition “A test is valid to the extent that inferences made from

Validity: a definition “A test is valid to the extent that inferences made from it are appropriate, meaningful, and useful” Standards for Educational and Psychological Testing, 1999

“Face Validity” “looks good to me”!!!!!!!

“Face Validity” “looks good to me”!!!!!!!

Trinitarian view of Validity Content (meaning) n Construct (meaning) n Criterion (use) n

Trinitarian view of Validity Content (meaning) n Construct (meaning) n Criterion (use) n

1) Content Validity “How adequately a test samples behaviors representative of the universe of

1) Content Validity “How adequately a test samples behaviors representative of the universe of behaviors the test was designed to measure. ”

Determining Content Validity ü ü ü describe the domain specify areas to be measured

Determining Content Validity ü ü ü describe the domain specify areas to be measured compare test to domain

Content Validity Ratio (CVR) Agreement among raters if item is: q Essential q Useful

Content Validity Ratio (CVR) Agreement among raters if item is: q Essential q Useful but not essential q Not necessary

2) Construct validity “A theoretical intangible” “An informed, scientific idea” -- how well the

2) Construct validity “A theoretical intangible” “An informed, scientific idea” -- how well the test measures that construct

Determining Construct validity ü behaviors related to constructs ü related/unrelated constructs ü identify relationships

Determining Construct validity ü behaviors related to constructs ü related/unrelated constructs ü identify relationships ü multi trait/multi method

Multitrait-Multimethod Matrix Correlate scores from 2 (or more tests) n Correlate scores obtained from

Multitrait-Multimethod Matrix Correlate scores from 2 (or more tests) n Correlate scores obtained from 2 (or more) methods n

Evidence of Construct Validity n Upholds theoretical predictions u Changes (? ) over time,

Evidence of Construct Validity n Upholds theoretical predictions u Changes (? ) over time, gender, training n Homogeneity of questions u (internal consistency, factor or item analysis) n Convergent/discriminant u Multitrait-multimethod matrix

Decision Making How well the test can be used to help in decision making

Decision Making How well the test can be used to help in decision making about a particular criterion.

Decision Theory Base rate n Hit rate n Miss rate n False positive n

Decision Theory Base rate n Hit rate n Miss rate n False positive n False negative n

3) Criterion Validity “The relationship between performance on the test and on some other

3) Criterion Validity “The relationship between performance on the test and on some other criterion. ”

Validity coefficient Correlation between test score and score on criterion measure.

Validity coefficient Correlation between test score and score on criterion measure.

Two ways to establish Criterion Validity A) Concurrent validity B) Predictive validity

Two ways to establish Criterion Validity A) Concurrent validity B) Predictive validity

Determining Concurrent validity Assess individuals on construct n Administer test to lo/hi on construct

Determining Concurrent validity Assess individuals on construct n Administer test to lo/hi on construct n Correlate test scores to prior identification v Use test later to make decisions n

Determining Predictive validity Give test to group of people n Follow up group n

Determining Predictive validity Give test to group of people n Follow up group n Assess later n Review test scores v If correlate with behavior later can use later to make decisions n

Incremental validity q Value of including more than one predictor q Based on multiple

Incremental validity q Value of including more than one predictor q Based on multiple regression q What is added to prediction not present with previous measures?

Expectancy data Taylor-Russell Table n Naylor-Shine Tables n Too vague, outdated, biased n

Expectancy data Taylor-Russell Table n Naylor-Shine Tables n Too vague, outdated, biased n

Unified Validity - Messick “Validity is not a property of the test, but rather

Unified Validity - Messick “Validity is not a property of the test, but rather the meaning of the scores. ” Value implications Relevance and utility

Unitarian considerations Content n Construct n Criterion n Consequences n

Unitarian considerations Content n Construct n Criterion n Consequences n

Threats to validity Construct underrepresentation (too narrow) n Construct-irrelevant variance (too broad) construct-irrelevant difficulty

Threats to validity Construct underrepresentation (too narrow) n Construct-irrelevant variance (too broad) construct-irrelevant difficulty construct-irrelevant easiness n

Example 1 Dr. Heidi considers using the Scranton Depression Inventory to help identify severity

Example 1 Dr. Heidi considers using the Scranton Depression Inventory to help identify severity of depression and especially to distinguish depression from anxiety. What evidence should Dr. Heidi use to determine if the test does what she hopes it will do?

Example 2 The newly published Diagnostic Wonder Test promises to identify children with a

Example 2 The newly published Diagnostic Wonder Test promises to identify children with a mathematics learning disability. How will we know whether the test does so or is simply a slickly packaged general ability test?

Example 3 Ivy College uses the Western Admissions Test (WAT) to select applicants who

Example 3 Ivy College uses the Western Admissions Test (WAT) to select applicants who should be successful in their studies. What type of evidence should we seek to determine if the WAT satisfies its purpose?

Example 4 Mike is reviewing a narrative report of his scores on the Nifty

Example 4 Mike is reviewing a narrative report of his scores on the Nifty Personality Questionnaire (NPQ). The report says he is exceptionally introverted and unusually curious about the world around him. Can Mike have any confidence in these statements or should they be dismissed as equivalent to palm readings at the county fair?

Example 5 A school system wants to use an achievement battery that will measure

Example 5 A school system wants to use an achievement battery that will measure the extent to which students are learning the curriculum specified by the school. How should the school system proceed in reviewing the available achievement tests?

Example 6 Super sun computers needs to hire three new employees. They have decided

Example 6 Super sun computers needs to hire three new employees. They have decided to administer the Computer Skills Assessment (CSA) to their applicants and use the results as the basis of their decision. How can they determine if that measure is a good fit for their hiring practice?

Project homework question n n What content or construct is your measure assessing? (explain

Project homework question n n What content or construct is your measure assessing? (explain your answer) What do you think congruent and discriminate constructs would be to the one in your measure? How would you determine the content or construct validity of your measure? How would you determine the criterion validity of your measure? Why would you use those approaches?

Project homework question Select a standardized instrument from MMY to use as a comparison

Project homework question Select a standardized instrument from MMY to use as a comparison for your measure? n Copy the relevant data. n Why did you select that instrument? n How would you use it to help standardize your measure? n