UCSF Educational Skills Workshop Barbara Startucsf edu Test
UCSF Educational Skills Workshop Barbara. Start@ucsf. edu Test Development Instructors: Sam Brondfield, MD, MA Ed Pat O’Sullivan, Ed. D http: //tinyurl. com/testdevelopm ent Faculty Developer Team: Christy Boscardin, Ph. D Pat O’Sullivan, Ed. D Mark Dellinges, DDS, MA
Objectives q. This session provides participants with knowledge and skills to create useful and psychometrically sound tests and At the end of the workshop the participants will be able to: ● Given a set of objectives, develop a blueprint ● Given a set of objectives, draft an assessment (e. g. multiple choice items, essay) 2 Test Development 12/13/2021
Test Development Steps 1) Specify the purpose of the test. 2) Develop frameworks describing the knowledge and skills to be tested. 3) Build test blueprint (specifications). 4) Create potential test items and scoring rubrics. 5) Review and pilot test items. 6) Evaluate the quality of items. 4 Test Development 12/13/2021
Topic 1: Test Blueprint § What • Roadmap • Documentation § Why • Evaluate/ensure exam quality • Exam development without blueprint = no clear direction 5 Test Development 12/13/2021
6 Test Development
Elements of Test Blueprint §Learning Objectives §Proportion of the Test Covering Each Domain (Learning Objectives) • Amount of time spent on each objective • Total # of hours allocated for test §Bloom’s Taxonomy (Targeted cognitive level) 7 Test Development 12/13/2021
Domain & Cognitive Level Assessed 8 Test Development 12/13/2021
Bloom’s Example from Cell Biology 9 Test Development 12/13/2021
Does the MCAT hinder efforts to introduce critical thinking in Biology Courses? 3 faculty rank 700 randomly selected items: (1: knowledge to 6: evaluation scale) 10 Test Development 12/13/2021
Small Group Exercise 1 § Review the list of test items and designate the appropriate Bloom’s Taxonomy. https: //ucsf. co 1. qualtrics. com/jfe/form/SV_bk. If 5 Yr 7 is. Fsqf. H 11 Test Development 12/13/2021
Let’s look at a template example Objectives Content Areas 1 % # of Items 1. _% 2. _% 3. __% 2 % # of Items 3 % # of Items Total # of Items
Topic 2: Constructed-Response Test Item Development § Purpose: • To assess trainees’ understanding of the subject-matter content • To assess trainees’ ability to communicate §Essay test should meet the following criteria: • Requires trainees to compose rather than select a response • Elicits trainee responses that must consist of one or more sentences. • No single response is correct. • The accuracy & quality of the responses must be judged by a content expert 13 Test Development 12/13/2021
Advantages & Limitations § Advantages: • Assess higher-order/critical thinking skills. • Evaluate learner thinking and reasoning. • More authentic to real life experiences. § Limitations: • Assess a limited sample of the range of content domain. • Time consuming to grade/score. 14 Test Development 12/13/2021
Misconceptions 1. Assess higher-order or critical thinking skills regardless of how the items are written. • Example A: What are the major advantages and limitations of essay questions? • Example B: Given their advantages and limitations, should an essay question be used to assess students’ abilities to create a solution to a problem? In answering this question provide brief explanations of the major advantages and limitations of essay questions. Clearly state whether or not you think an essay question should be used and explain the reasoning for your judgment. 2. The use of essay questions eliminates the problem of guessing. 3. Essay questions benefit all students by placing emphasis on the importance of written communication skills. 15 Test Development 12/13/2021
How to Construct Essay Questions? 1. Clearly define the intended learning outcome to be assessed by the item. 2. Avoid using essay questions for intended learning outcomes that are better assessed with other formats 3. Define the task and shape the problem situation. 4. Specify the relative point value and the approximate time limit in clear directions. 5. State the criteria for grading 6. Use several relatively short essay questions rather than one long one 7. Improve the essay question through preview and review. 16 Test Development 12/13/2021
2. Determining when to use Essay vs. MCQs using Directive Verbs § Recall – MCQ § Evaluate – Essay Consider the following example: § Learning objective: Analyze the impact of America at war on the American economy. § Less effective essay Q: Describe the impact of America at war on the American economy. § More effective essay Q: Analyze the impact of America at war on the American economy by describing how different effects of the war work together to influence the economy 17 Test Development 12/13/2021
Group Exercise: Essay or MCQ • analyze • justify • apply • explain • classify • evaluate 18 Test Development • predict • propose • compose • infer • defend • interpret 12/13/2021
3. Task: what the student is asked to do § Effective Essay Items should have: • Tasks (specify the performance students should exhibit when responding) – are structured as much as possible • Scope of the context (Problem) – are focused § Tasks are made up of: Directive verb & Objective of the Verb Task = Justify the view you prefer Objective of the Verb Directive Verb Task = Defend theory as the most suitable for the situation 19 Test Development 12/13/2021
3. Scope of Problem: Background information § Provides the context within which they can demonstrate the performance to be assessed. § Problem is stated usually as a background information. § Designed to elicit application or transfer of knowledge to novel situation. 20 Test Development 12/13/2021
Problem Example: Intended Learning Outcome: Create a hypothesis concerning how a particular program may affect the quality of education for students Problem: A national service entitled “Pick a Prof” makes teacher evaluation data public. “Pick a Prof” is a service that gives students the ability to take control of their education by using the grade histories, student reviews and course schedules to choose a course and professor of their liking. Currently, professionals are uncertain about the effects of this service on the quality of education for students. Task: Create a hypothesis about the affect the service may have on the quality of education for students using the service. Support your hypothesis with reasons and examples. 21 Test Development 12/13/2021
Less to More Focused 1. Less focused question: Evaluate the impact of the Industrial Revolution on England. 2. A little more focused question: Evaluate the impact of the Industrial Revolution on the role of fathers in poor communities of England. 3. More focused question: Evaluate the impact of the Industrial Revolution on the role of fathers in poor communities of England based on whether or not the Industrial Revolution improved fathers’ abilities to provide the material necessities of life and education and training for their children. 4. Most focused question: Evaluate the impact of the Industrial Revolution on the role of fathers in poor communities of England based on whether or not the Industrial Revolution improved fathers’ abilities to provide the material necessities of life and education and training for their children. Explain how the role of a father as provider changed with the Industrial Revolution and whether or not the changes were an improvement for fathers striving to provide for their children. 22 Test Development 12/13/2021
4. Specify the relative point value and the approximate time limit in clear directions. § Why: to provide structure & guidelines for the learners. (Unlike MCQs) § How do you specify relative point values: Consider • Complexity of the task • Review test blueprint (proportion of the test) § Tip: write out a model answer to gauge the length & time allocation 23 Test Development 12/13/2021
5. State the criteria for grading § Typically include criteria for evaluating the quality of content & writing: • The content of all of your responses to essay questions will be graded in terms of the accuracy, completeness, and relevance of the ideas expressed. The form of your answer will be evaluated in terms of clarity, organization, correct mechanics (spelling, punctuation, grammar, capitalization), and legibility. § Student understanding is: ‒ Well-developed (4) ‒ Adequate (3) ‒ Low (2) ‒ Very low to none (1) 24 Test Development 12/13/2021
Score Point Criteria for Scoring 4 3 2 1 The response demonstrates in-depth understanding of the relevant and important ideas. The response effectively communicates an explanation of the biological and/or scientific process. The organization enhances the central ideas. The key concepts are logically organized. Most or all of the essay’s organizational components are strong. The response includes some of the important ideas related to the topic. The content’s organization is clear and coherent. The response’s order and structure apparent. The sequence or cause/effect of processes and facts is logical. Paragraphing is evident. The response may include an important idea, part of an idea, or a few facts but does not develop the ideas or deal with the relationships among the ideas. The response contains misconceptions, inaccurate or irrelevant information. The content’s organization is skeletal. The response’s order and structure are loosely planned. The sequence or cause/effect of processes or facts is not consistently logical. Paragraphing is minimally evident. The response shows little or no knowledge or understanding of the topic. The writing is haphazard and disjointed. The response lacks organization and coherence. No or minimal plan is evident. The facts may be randomly presented.
Relative Point Value: Example from AP Biology § Explain TWO unique properties of human embryonic stem cells that distinguish them from other human cell types. Describe a current medical application of human stem cell research. (3 points maximum) § Description of a current medical application (1 point maximum): Acceptable responses include, but are not limited to, the following: - Repair of brain and spinal tissues. - Treatment of diseases such as leukemia, stroke, Alzheimer’s, Parkinson’s, diabetes, cystic fibrosis. - Therapeutic cloning of human cells, tissues, and certain organs (e. g. , bone, cartilage, muscle). - Reprogramming of diseased cells. - Testing of new drugs. - Storage of umbilical cord stem cells. 26 Test Development 12/13/2021
6. Use several relatively short essay questions rather than one long one § Why: • To increase domain coverage • To reduce burden on grading 27 Test Development 12/13/2021
7. Improve the essay question through preview and review. § Preview: • Anticipate (Predict) learner response • Write a model response (good for creating anchors) • Review test blueprint with colleagues or learners • Pilot items whenever possible § Review: • Carefully review range of responses • Evaluate the distribution of scores • Evaluate rater reliability through sampling check 28 Test Development 12/13/2021
Small Group Exercise 2: Learning Objective: The student is able to use visual representations to analyze situations or solve problems qualitatively to illustrate how interactions among living systems and with their environment result in the movement of matter and energy. Question: § Populations of a plant species have been found growing in the mountains at altitudes above 2, 500 meters. Populations of a plant that appears similar, with slight differences, have been found in the same mountains at altitudes below 2, 300 meters. § Describe TWO kinds of data that could be collected to provide a direct answer to the question, do the populations growing above 2, 500 meters and the populations growing below 2, 300 meters represent a single species? 1. Is the question aligned with the learning objective? (Y/N) 2. What is the problem & task? 3. Can you improve the question? 29 Test Development 12/13/2021
Topic 3: MCQ Exams § How good of a test-taker are you? § Take the Quiz 30 Test Development 12/13/2021
Question 1 § The primary purpose of the stam is to remove the: A. Carm B. Denton C. Entace D. Menace E. Stam bar 31 Test Development 12/13/2021
Question 1 § The primary purpose of the stam is to remove the: A. Carm B. Denton C. Entace D. Menace E. Stam bar Word repeat: A word or phrase is included in the stem and in the correct answer. 32 Test Development 12/13/2021
Question 2 § Which of the following have won the greatest number of “Abby” awards? A. Jones and Smith B. Smith and Taylor C. Smith and White D. White and Allen 33 Test Development 12/13/2021
Question 2 § Which of the following have won the greatest number of “Abby” awards? A. Jones and Smith B. Smith and Taylor C. Smith and White D. White and Allen Convergence strategy: The correct answer included the most elements in common with other options. 34 Test Development 12/13/2021
Question 3 § How many pounds of pressure are exerted by a callum? A. 2. 6 B. 150 C. 260 D. 2600 35 Test Development 12/13/2021
Question 3 § How many pounds of pressure are exerted by a callum? A. 2. 6 B. 150 C. 260 D. 2600 Convergence strategy: The correct answer included the most elements in common with other options. 36 Test Development 12/13/2021
Question 4 § The stannon is aided by a: A. Anstel B. Immon C. Octal D. Port 37 Test Development 12/13/2021
Question 4 § The stannon is aided by a: A. Anstel B. Immon C. Octal D. Port Grammatical cues: One or more distractors don’t follow grammatically from the stem. 38 Test Development 12/13/2021
Question 5 § Stamation normally occurs when the: A. Anstels rupture B. Immon falls and the denton is in place C. Octal rotates easily D. Ports pass over the carm 39 Test Development 12/13/2021
Question 5 § Stamation normally occurs when the: A. Anstels rupture B. Immon falls and the denton is in place C. Octal rotates easily D. Ports pass over the carm Long correct answer: Correct answer is longer, more specific, or more complete than other options. 40 Test Development 12/13/2021
Question 6 § The stanon frequently overheats because: A. All anastoles are belious B. No immon is directly fectitious C. Ports are always actial D. The octal is usually casable 41 Test Development 12/13/2021
Question 6 § The stanon frequently overheats because: A. All anastoles are belious B. No immon is directly fectitious C. Ports are always actial D. The octal is usually casable Absolute terms: Terms such as “always” or “never” are used in options 42 Test Development 12/13/2021
Principles for Writing Answer Options 1. Options should be as homogeneous as possible. 2. The wording of the correct response should not provide a “sound alike” or direct clues with the stem. 3. Options should be grammatically correct in terms of the stem. (The correct answer should provide not irrelevant grammatical clues. ) 4. Since indefinite terms such as “generally”, “frequently”, “for the most part” tend to be in true options, and absolute terms such as “always” and “never” tend to be in false options, options should not provide these clues. 5. All items should be plausible. 43 Test Development 12/13/2021
Writing MCQ Items § Most slides are used with permission from Susan M Case author of first edition of this resource: Paniagua, M and Swygert, K (eds). Constructing Written Test Questions for the Basic and Clinical Sciences, 2016. § This can be downloaded from: http: //www. nbme. org/publications/item-writing-manual. html 44 Test Development 12/13/2021
Most widely used MCQ format § There is only one best answer and several choices that include “distractors” § The “incorrect” options are not totally “wrong” § All options could be “laid out” on a single continuum, from “least” to “most” likely § The question typically includes the words “MOST” or “BEST” 45 Test Development 12/13/2021
Rank Ordered Options Item D C Least Correct Answer A E B Most Correct Answer • Note: Options must be homogeneous (eg, all diagnoses, all muscles). You must be able to rank-order the options on the same dimension. 46 Test Development 12/13/2021
A-type Question Components A 35 -year old man has had a stomach ache all afternoon. He ate the following lunch: two big Mc. Donalds hamburgers, an ice cream shake, large fries. Which is the most likely diagnosis? A. Abdominal aneurysm B. Appendicitis C. Bowel obstruction D. Cholecystitis E. Colon cancer F. Pancreatitis G. Too much lunch 47 Test Development Vignette, Scenario, Or Stem Lead-in Options A, B, C, D, E, and F are Distractors Option G is the Key 12/13/2021
“Cover the Options” Rule A 32 yr old man has a 4 -day history of progressive weakness in his extremities. He has been healthy except for an upper respiratory tract infection 10 days ago. His temperature is 100 F, BP 130/80, pulse 94, respirations 42 and shallow. He has symmetric weakness of both sides of the face and the proximal and distal muscles of the extremities. Sensation is intact. No deep tendon reflexes can be elicited; the plantar responses are flexor. Which of the following is the most likely diagnosis? A. B. C. D. E. Acute disseminated encephalomyelitis Guillain-Barré syndrome Myasthenia gravis Poliomyelitis Polymyositis 48 Test Development 12/13/2021
Flaw – Cover the Options § Which of the following is true about pseudogout? A. It occurs frequently in women. B. It is seldom associated with acute pain in a joint. C. It may be associated with a finding of chondrocalcinosis. D. It is clearly hereditary in most cases. E. It responds well to treatment with allopurinol. 49 Test Development 12/13/2021
Flaw - Inconsistent Styles Following a second episode of salpingitis, what is the likelihood that a woman is infertile? A. Less than 20% B. 20 to 30% C. Greater than 50% D. 90% E. 75% 50 Test Development 12/13/2021
Flaw – Vague Frequency Terms Severe obesity in early adolescence A. usually responds dramatically to dietary regimens B. often is related to endocrine disorders C. has a 75% chance of clearing spontaneously D. rarely shows a good prognosis E. frequently responds to pharmacotherapy and intensive psychotherapy 51 Test Development 12/13/2021
Flaw – None of the Above A 62 -year-old man with alcohol dependence is admitted to the hospital for transurethral resection of the prostate. The following morning, while being transported to the operating room, he has two generalized seizures within five minutes. Neurological examination shows no focal abnormalities. Intravenous administration of which of the following drugs is appropriate? A. Diazepam B. Haloperidol C. Phenobarbitol D. Phenytoin E. None of the above 52 Test Development 12/13/2021
Basic Guidelines § Overall: • Consistent with intended learning objectives/bloom’s level? § Stem: • Is the context given (e. g. clinical vignette/scenario)? • Does the item test beyond just recall of knowledge? § Lead in: • Does the lead in focus on one aspect of a concept/disease/condition? • Can the question be answered without looking at the options (covered the answers rule)? § Options: • Are all options uniform (e. g. length), homogeneous (e. g. diagnosis)? • Is there usage of ambiguous terms (e. g. almost, never, frequent)? • Are there ‘‘all of the above’’ or ‘‘none of the above’’ options? 53 Test Development 12/13/2021
Test Item Analysis § Item Difficulty § Item Discrimination § Reliability § Validity 54 Test Development 12/13/2021
Item Difficulty § Definition: The percentage of students who answered the item correctly. (also called p value in item statistics) High (Difficult) Medium (Moderate) Low (Easy) <= 30 > 30 AND < 80 >=80 55 Test Development 12/13/2021
Item Discrimination § The Discrimination Index provides information about how well the test item is able to distinguish between the performance of students who did well on the exam and students who did poorly. § In general, students who did well on the exam should select the correct answer to any given item on the exam. § Index ranges from +1 to -1. § For exams with a normal distribution, a discrimination of 0. 3 and above is good; 0. 6 and above is very good. § Values close to 0 mean that most students performed the same on an item. § Negative values mean that the lower performing students performed better on that item. 56 Test Development 12/13/2021
Item Analysis 1 Group A B C* D E F Hi 0 1 90 3 3 3 Lo 0 1 60 25 8 6 Total 0 1 74 12 7 6 p-value: 74 57 Test Development discrimination index: 0. 33 12/13/2021
Item Analysis 2 Group A B C* D E F Hi 44 1 50 2 1 2 Lo 20 15 21 22 20 2 Total 32 7 34 14 11 2 p-value: 34 58 Test Development discrimination index: 0. 30 12/13/2021
Item Analysis 3 Group A B* C D E F Hi 1 1 94 4 1 2 Lo 21 6 53 15 6 3 Total 9 2 77 8 2 2 p-value: 2 59 Test Development discrimination index: -0. 21 12/13/2021
Practice § Divide into small groups and complete assessment for assigned items Complete skills assessment at http: //tinyurl. com/testdevelopment 60 Test Development 12/13/2021
Evaluation and Action Plan Link to workshop dashboard: http: //tinyurl. com/testdevelopment 61 Test Development 12/13/2021
Creative Commons License Attribution-Non. Commercial-Share Alike 3. 0 Unported You are free: • to Share — to copy, distribute and transmit the work • to Remix — to adapt the work Under the following conditions: • Attribution. You must give the original authors credit (but not in any way that suggests that they endorse you or your use of the work). • Noncommercial. You may not use this work for commercial purposes. • Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. See http: //creativecommons. org/licenses/by-nc-sa/3. 0/ for full license. 62 Test Development 12/13/2021
- Slides: 61