Cognitive and Academic Assessment Dr K A Korb

Cognitive and Academic Assessment Dr. K. A. Korb University of Jos

Outline • • • Classroom Assessment Achievement Testing Intelligence Testing General Issues in Psychological Testing Reliability Validity Dr. K. A. Korb University of Jos

Classroom Assessment • Purpose of classroom assessment – – Provide information to students, parents, teachers, and others Provide information to adapt instructional practices Increase learning Enhance motivation • Grades should measure achievement in the specific class – Grades should NOT measure: • • • Dr. K. A. Korb University of Jos Achievement in the specific class Effort Ability Interests Attitude Degree of corruption

Classroom Assessment • Grading system should be: – Clear and understandable – Designed to support learning and provide frequent feedback – Based on hard data – Fair to all students – Defendable to students, parents, and administrators Dr. K. A. Korb University of Jos

Classroom Assessment • Components of grading system – Formative Evaluation: Evaluation before or during instruction to provide feedback to the teacher/student – Summative: Evaluation after instruction for grading purposes Dr. K. A. Korb University of Jos

Tips for Writing Exams • The purpose of an exam is assess how well the student understands the course content – Critically read questions from the students’ perspective to determine if the question is understandable. – Rotate exam questions between terms so students cannot get access to the questions prior to the exam. • Students have to study more of the material and think deeper about the content – Ask questions that test Bloom’s higher level of thinking • Assess understanding, not memorization • Cheating becomes more difficult Dr. K. A. Korb University of Jos

Multiple Choice Items • Strengths – Able to assess different levels of thinking – Reliable scoring – Easy to grade • Weaknesses – Unable to give credit for partial knowledge – Decontextualized Dr. K. A. Korb University of Jos

Multiple Choice Items • Martha talks to the person sitting next to her while the teacher is giving instruction. To make sure that Martha does not talk in class next time, the teacher makes Martha write an essay on why paying attention to the teacher is important. This is an example of: A. Positive Reinforcement B. Negative Reinforcement C. Positive Punishment D. Negative Punishment Dr. K. A. Korb University of Jos

Multiple Choice Items • Components – Stem: Question/Problem • Determines level of knowledge assessed – Correct answer – Distracters: Wrong answers • Contain likely misconceptions Dr. K. A. Korb University of Jos

Multiple Choice Items • Guidelines for preparing Multiple Choice Items – – – – – Dr. K. A. Korb University of Jos Present one clear problem in the stem Make all distracters plausible Avoid similar wording in the stem and correct choice Keep the correct answer and distracters similar in length Avoid absolute terms (always, never) in incorrect choices Keep stem and distracters grammatically consistent Avoid using two distracters with the same meaning Emphasize NEGATIVE wording Use None of the above with care and avoid All of the above

Essay Items • Strengths – Assess creative and critical thinking – Students more likely to meaningfully organize information when studying • Weaknesses – Scoring takes time – Scoring can be unreliable Dr. K. A. Korb University of Jos

Essay Items • Rubric: Scoring scale that describes criteria for grading – Establish criteria for credit based on critical element of essay • According to the Theory of planned Behavior, what are the four major factor that influence the relationship between attitudes and behavior? – 2 points apiece, 1 points the name and 1 points for the explanation – Behavioral intentions: Cognitive representation of the readiness to perform a behavior – Specific attitude towards behavior: Whether like or dislike – Subject norms: Belief about how significant others view behavior – Perceived behavior control: Perception about the ability to perform the behavior Dr. K. A. Korb University of Jos

Essay Items • Scoring essay items – Require students to answer each item – Prepare rubric in advance – Write a model answer for each item and compare a few students responses to model to determine if score adjustments are needed – Score all students’ answers to one essay question before moving to the next essay question – Score all responses to single item in one sitting – Score answers without knowing the identity of the student Dr. K. A. Korb University of Jos

Intelligence Testing • Achievement Test: Instrument created to assess developed skills or knowledge in a specific domain – Purpose of standardized achievement testing is to place students in appropriate education environment • Intelligence Test: Ability to perform on cognitive tasks – Sample performance on a variety of cognitive tasks and then compare performance to others of a similar developmental level – Purpose of Intelligence Testing • Diagnose students with special needs – Learning Disabilities – Talented and Gifted • Place students in appropriate educational environments • Educational Research Dr. K. A. Korb University of Jos

Intelligence Testing • Example: Stanford Binet Intelligence Scale: Fourth Edition • • Verbal Reasoning Visual Reasoning Quantitative Reasoning Short-Term Memory • Subtest Score and General Score Dr. K. A. Korb University of Jos

Reliability: Consistency of results Reliable Dr. K. A. Korb University of Jos Reliable Unreliable

Reliability Theory • Actual score on test = True score + Error – True Score: Hypothetical actual score on test • The reliability coefficient indicates the ratio between the true score variance on the test and the total variance – In other words, as the error in testing decreases, the reliability increases Dr. K. A. Korb University of Jos

Reliability: Sources of Error • Error in Test construction – Error in Item Sampling: Results from items that measure more than one construct in the same test • • Error in Test Administration – Test environment: Room temperature, amount of light, noise – Test-taker variables: Illness, amount of sleep, test anxiety, exam malpractice – Examiner-related variables: Absence of examiner, examiner’s demeanor Error in Test Scoring – Scorer: With subjectively marked assessments, different scorers may give different scores to the same responses Dr. K. A. Korb University of Jos

Reliability: Error due to Test Construction • Split-Half Reliability: Determines how consistently the measure assesses the construct of interest. – A low split-half reliability indicates poor test construction. • An instrument with a low split-half reliability is probably measuring more constructs than it was designed to measure – Calculate split-half reliability with Coefficient Alpha Dr. K. A. Korb University of Jos

Reliability: Error due to Test Administration • Test-Retest Reliability: Determines how much error in a test score is due to problems with test administration. – Administer the same test to the same participants on two different occasions. – Correlate the test scores of the two administrations using Pearson’s Product Moment Correlation. Dr. K. A. Korb University of Jos

Reliability: Error due to Test Construction with Two Forms of the Same Measure • Parallel Forms Reliability: Determines the similarity of two different versions of the same measure. – Administer the two tests to the same participants within a short period of time. – Correlate the test scores of the two tests using Pearson’s Product Moment Correlation. Dr. K. A. Korb University of Jos

Reliability: Error due to Test Scoring • Inter-Rater Reliability: Determines how closely two different raters mark the assessment. – Give the exact same test results from one test administration to two different raters. – Correlate the two markings from the different raters using Pearson’s Product Moment Correlation Dr. K. A. Korb University of Jos

Validity: Measuring what is supposed to be measured Valid Dr. K. A. Korb University of Jos Invalid

Validity • Three types of validity: – Construct validity: Measure the appropriate psychological construct – Criterion validity: Predict appropriate outcomes – Content validity: Adequate sample of content • Each type of validity should be established for all psychological tests. Dr. K. A. Korb University of Jos

Construct Validity • Construct Validity: Appropriateness of inferences drawn from test scores regarding an individual’s status of the psychological construct of interest • Two considerations: – Construct underrepresentation – Construct irrelevant variance Dr. K. A. Korb University of Jos

Construct Validity • Construct underrepresentation: A test does not measure all of the important aspects of the construct. – Content Validity • Construct-irrelevant variance: Test scores are affected by other unrelated processes Dr. K. A. Korb University of Jos

Sources of Construct Validity Evidence • Homogeneity: The test measures a single construct – Evidence: High internal consistency • Convergence: Test is related to other measures of the same construct and related constructs – Evidence: Criterion Validity • Theory: The test behaves according to theoretical propositions about the construct – Evidence by changes in test scores according to age: Scores on the measure should change by age as predicted by theory. – Evidence from treatments: Scores on the measure change as predicted by theory from a treatment between pretest and posttest. Dr. K. A. Korb University of Jos

Criterion Validity • Criterion Validity: Correlation between the measure and a criterion. – Criterion: Other accepted measures of the construct or measures of other constructs similar in nature. • A criterion can consist of any standard with which your test should be related – Examples: • • Dr. K. A. Korb University of Jos Behavior Other test scores Ratings Psychiatric diagnosis

Criterion Validity • Three types: – Convergent validity: High correlations with measures of similar constructs taken at the same time. – Divergent validity: Low correlations with measures of different constructs taken at the same time. – Predictive validity: High correlation with a criterion in the future Dr. K. A. Korb University of Jos

Criterion Validity • Example: An essay test of science reasoning was developed to admit students into the science program at the university. – Convergent Validity: High correlations with other science tests, particularly well established science tests. – Divergent Validity: Low correlations with measures of writing ability because the test should only measure science reasoning, not writing ability. – Predictive Validity: High correlations with future grades in science courses because the purpose of the test is to determine who will do well in the science program at the university. Dr. K. A. Korb University of Jos

Criterion Validity Example Criterion Validity Evidence for New Science Reasoning Test: Correlations between Science Reasoning and Other Measures New Science Reasoning Test WAEC Science Scores . 83 School Science Marks . 75 WAEC Writing scores . 34 WAEC Reading Scores . 24 Future marks in university science courses . 65 Dr. K. A. Korb University of Jos High correlations with other measures of science ability indicates good criterion validity. Low correlations with measures unrelated to science ability indicates good criterion validity. High correlation with future measures of science ability indicates good criterion validity.

Content Validity • Content Validity: Sampling the entire domain of the construct it was designed to measure Dr. K. A. Korb University of Jos

Content Validity Dr. K. A. Korb University of Jos

Content Validity • To assess: – Gather a panel of judges – Give the judges a table of specifications of the amount of content covered in the domain – Give the judges the measure – Judges draw a conclusion as to whether the proportion of content covered on the test matches the proportion of content in the domain. Dr. K. A. Korb University of Jos

Face Validity • Face validity: Addresses whether the test appears to measure what it purports to measure. – To assess: Ask test users and test takers to evaluate whether the test appears to measure the construct of interest. • Face validity is rarely of interest to test developers and test users. – The only instance where face validity is of interest is to instill confidence in test takers that the test is worthwhile. – Face validity CANNOT be used to determine the actual interpretive validity of a test. Dr. K. A. Korb University of Jos

Concluding Advice • The best way to determine that the measures you use are both reliable and valid is to use a measure that another researcher has developed and validated – This will assist you in three ways: 1. You can confidently report that you have accurately measured the variables in the study. 2. By using a measure that has been used before, your study is intimately tied to previous research that has been conducted in your field, an important consideration in determining the importance of your study. 3. It saves you time and energy in developing your measure Dr. K. A. Korb University of Jos

Revision • What are the purposes of classroom assessment, achievement testing, and intelligence testing? • Compare and contrast the strengths and weaknesses of multiple choice and essay tests • Describe three sources of error that contribute to lowering the reliability of an instrument. How can the reliability coefficient be calculated for each source of error? • What are three types of validity evidence required for psychological measurement? How can each type be assessed? Dr. K. A. Korb University of Jos