Methods of Standard Setting Powerpoint Templates Natalia Gaponova

  • Slides: 32
Download presentation
Methods of Standard Setting Powerpoint Templates Natalia Gaponova Page 1

Methods of Standard Setting Powerpoint Templates Natalia Gaponova Page 1

Introduction • All standard setting methods involve expert judgemental decision making at some level.

Introduction • All standard setting methods involve expert judgemental decision making at some level. . . (Jaegar, 1979) • There is no such thing as a true standard, but there is a theoretical cut-score that would be set by a judge if he or she totally understood the process, the test, the content, and the policy and had a true score on the test in mind as the standard. The question is whether the standard setting method can recover theoretical cut-score assuming a judge performed every task consistently and without error (Reckase, 2000) • Many different terms are used in the measurement literature to refer to performance standards: “passing scores”, “cutoff score”, “performance levels”, “achievement levels”, “mastery levels”, “proficiency levels”, “tresholds” and “standards” (Hambleton, 2001) Powerpoint Templates Page 2

The importance of standard-setting • Cut-score – is crucial for all participants of testing

The importance of standard-setting • Cut-score – is crucial for all participants of testing must be reasoned and fair necessary to use methods that allow with a mathematical precision to make it possible Powerpoint Templates Page 3

Interpretation of the mass-testing results Common solution: Setting of cut-scores and division of examinees

Interpretation of the mass-testing results Common solution: Setting of cut-scores and division of examinees into groups in accordance with their ability level Participants of testing need • to compare themselves with other examinees • to estimate correctly and adequately their level of mastery of the material Policy-makers Are interested in overall level of educational achievements, which could reflect the real situation in schools and classes of a region Powerpoint Templates Page 4

Why is it important to establish reasonable and fair cut-scores? 1. Professional and ethic

Why is it important to establish reasonable and fair cut-scores? 1. Professional and ethic responsibility of people, who conduct testing for the provided results 2. Interpretation of the results should be available to any understanding of the audience and should not cause an obvious disagreement with them 3. The results interpretation should reflect real situation and be informative for policy-makers 4. The results interpretation should not have a dual meaning – the examinees of one group should have really different levels of ability from examinees from another group Powerpoint Templates Page 5

Second Page : Criterion-referenced Cycle Diagram ipsum dolor sit amet, consectetur "Lorem adipisicing elit,

Second Page : Criterion-referenced Cycle Diagram ipsum dolor sit amet, consectetur "Lorem adipisicing elit, sed do eiusmod tempor incididunt Classification of ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Norm. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla referenced pariatur. Standard-Setting Test-centered Excepteur sint occaecat cupidatat non proident, Methods sunt in culpa qui officia deserunt mollit anim id est laborum. " Examineecentered Powerpoint Templates Page 6

The most commonly used classification scheme nowadays is the one suggested by Jaeger (1989)

The most commonly used classification scheme nowadays is the one suggested by Jaeger (1989) who splits the standard setting methods into two large groups Test-centered • • • Angoff Ebel Nedelsky Jaeger Objective Standard Setting • Bookmark • Etc. Examinee-centered • Method of Contrasting Groups • Method of Borderline group • Etc. Powerpoint Templates Page 7

Test-centered method ANGOFF Powerpoint Templates Page 8

Test-centered method ANGOFF Powerpoint Templates Page 8

Method Angoff – one of the most preferred widely and frequently used methods Angoff

Method Angoff – one of the most preferred widely and frequently used methods Angoff Traditional Modified Powerpoint Templates Page 9

Procedure of standard setting (traditional method Angoff) Ø Experts rate the probability that a

Procedure of standard setting (traditional method Angoff) Ø Experts rate the probability that a barely or minimally satisfactory or qualified person would answer each test item correctly Ø The average of these probabilities across judges or raters is the cutoff score Powerpoint Templates Page 10

Advantages and disadvantages + - • Transparency and clarity • Simplicity • Flexibility •

Advantages and disadvantages + - • Transparency and clarity • Simplicity • Flexibility • ? Objectiveness decision making about the probability of a correct answer by a minimally competent examinee • One round in rating variable values (fluctuating rated probability) Powerpoint Templates Page 11

Test-centered method EBEL Powerpoint Templates Page 12

Test-centered method EBEL Powerpoint Templates Page 12

Procedure of Standard Setting • 2 Rounds • Experts classify independently test items by:

Procedure of Standard Setting • 2 Rounds • Experts classify independently test items by: I level of difficulty easy medium hard II level of relevance essential important acceptable Powerpoint Templates questionable Page 13

For each judge then: All items could be classified 12 cells in a 3*4

For each judge then: All items could be classified 12 cells in a 3*4 grid defined by the three difficulty and four relevance category. As in the example: categories Expert № 3 Number of items in a category (А) Essential Easy Medium Hard 11 1 0 % correctly performed items Expert № 4 А*В (В) 60 25 10 Number of items in a category (А) 660 25 0 10 3 1 % correctly performed items Expert № 5 А*В (В) Number of items in a category (А) % correctly performed items А*В (В) 70 25 0 700 75 0 13 1 0 75 0 0 975 0 0 0 0 0 … Questionable Easy Medium Hard Mean for all experts Cut-score 0 0 0 25. 1 26. 7 35 28 12 Powerpoint Templates Page 14

How to count a cut-score Judges indicated the percentage of items within each of

How to count a cut-score Judges indicated the percentage of items within each of the 12 cells that a student should answer correctly in order to be judged minimally competent each item assigned to one of the 12 cells based on the expert’s ratings the percent passing judgment for a cell multiplied times the number of items in a cell these products summed over all 12 cells to get an overall passing score for a judge these passing scores averaged over judges in order to get the composite passing score Powerpoint Templates Page 15

Advantages and disadvantages - + • Can be used with different types of items

Advantages and disadvantages - + • Can be used with different types of items (not only multiple-choice) • It may be challenging for standard setting participants to keep the two dimensions of difficulty and relevance distinct because those dimensions may, in some situations, be highly correlated • Validity concern has to do with judgments about item relevance. Because the inclusion of items judged to be of questionable relevance appears on its face to weaken the validity evidence supporting defensible interpretation of the total test scores Powerpoint Templates Page 16

Test-centered NEDELSKY Powerpoint Templates Page 17

Test-centered NEDELSKY Powerpoint Templates Page 17

General concept Nedelsky proposed considering the characteristics and performance of a hypothetical borderline examinee

General concept Nedelsky proposed considering the characteristics and performance of a hypothetical borderline examinee that he referred to as the “F-D student”. Responses (distractors) which the lowest D-student should be able to reject as incorrect, and which therefore should be attractive to [failing students] are called Fresponses… Students who possess just enough knowledge to eliminate F-responses and must choose among the remaining responses at random are called FD students. Powerpoint Templates Page 18

Procedure of Standard Setting • The experts independently determine F-responses which minimally competent examinees

Procedure of Standard Setting • The experts independently determine F-responses which minimally competent examinees would be able to eliminate as incorrect • The number of other options determines the probability with which the candidate will answer correctly the question: a plausible answer = 100%, 2 = 50%, 3 = 33%, 4 = 25%, and 5 = 0% probability of a correct answer Powerpoint Templates Page 19

An example • Participants judged that, for a certain five-option item, borderline examinees would

An example • Participants judged that, for a certain five-option item, borderline examinees would be expected to rule out two of the options as incorrect, leaving them to choose from the remaining three options. The Nedelsky rating for this item would be 1/3 = 0. 33. Repeating the judgment process for each item would give a number of Nedelsky values equal to the number of items in the test (n). The sum of the n values can be directly used as a raw score cut score. For example, a 50 -item test consisting entirely of items with Nedelsky ratings of 0. 33 would yield a recommended passing score of 16. 5 (i. e. , 50 × 0. 33 = 16. 5) Powerpoint Templates Page 20

Advantages and disadvantages - + • Nedelsky method is used for many • Can

Advantages and disadvantages - + • Nedelsky method is used for many • Can be used only with multipleyears to establish threshold choice items assessment. Probably it’s been • Raters tend not to assign popular for many years, because probabilities of 1. 00 (i. e. , to judge the procedure is clear for experts, that a borderline examinee could rule they can make a decision about all incorrect response options), responses quickly, which is this tends to create a downward bias minimally competent examinee in item ratings (i. e. , a rating of. 50 is would be able to eliminate as assigned to an item instead of 1. 00) incorrect. with the overall result being a • It can be used without preliminary somewhat lower passing score than approbation of a test the participants may have intended to recommend, and somewhat lower passing scores compared to other methods Powerpoint Templates Page 21

Test-centered (based on Item-Response Theory) BOOKMARK Powerpoint Templates Page 22

Test-centered (based on Item-Response Theory) BOOKMARK Powerpoint Templates Page 22

Essential materials Directions to Bookmark participants Ordered item booklet Booklet guideline Student exemplar papers

Essential materials Directions to Bookmark participants Ordered item booklet Booklet guideline Student exemplar papers Scoring Guide Powerpoint Templates Page 23

Basic steps of the procedure Round III Round I Standard Setting Presentation of the

Basic steps of the procedure Round III Round I Standard Setting Presentation of the percentage of students falling into each performance level and each median cut-score from Round 2. After discussion individual judgments Overview of established cut-scores by every expert, repeating of the same procedure as in the first step Experts are informed about the essential number of cut-scores to establish. Experts work in small groups, all the. Templates essential material is Powerpoint introduced to them Page 24

Round 1 • The main goals are to get panelists familiar with the ordered

Round 1 • The main goals are to get panelists familiar with the ordered item booklet, set initial bookmarks, and then discuss the placements. • Panelists are asked to discuss and determine the content that students should master for placement into a given performance level. • Their independent judgments of cut-scores are expressed by simply placing a bookmark between the items judged to represent a cut-point. One bookmark is placed for each of the required cut-points. • Items preceding the participant's bookmark reflect content that all students at the given performance level are expected to know and be able to perform successfully with a probability of at least 0. 67 or 0. 50. Powerpoint Templates Page 25

Round 2 • The first activity in Round 2 involves having each member place

Round 2 • The first activity in Round 2 involves having each member place bookmarks in his/her ordered item booklet where each of the other panelists in their small group made their bookmark placement. For a group of 6 people, each panelist’s ordered booklet will have 6 bookmarks for each cut point. • Discussions are then focused on the items between the first and last bookmarks for each performance level. Upon completion of this discussion, the panelists then independently reset their bookmarks. The median of the Round 2 bookmarks for each cut point is taken as that group’s recommendation for that cut-point. Powerpoint Templates Page 26

Round 3 • The percentage of students falling into each performance level is presented,

Round 3 • The percentage of students falling into each performance level is presented, given each group’s median cut-score from Round 2. • With this information of how students actually performed, the panelists discuss the bookmarks in the large group and then make their Round 3 independent judgments of where to place the bookmarks. • The median for the large group is considered to be the final cut-point for a given performance level. Powerpoint Templates Page 27

Examinee-centered METHOD OF CONTRASTING GROUPS Powerpoint Templates Page 28

Examinee-centered METHOD OF CONTRASTING GROUPS Powerpoint Templates Page 28

Method of contrasting groups • Procedure includes testing of two groups of examinees Competent

Method of contrasting groups • Procedure includes testing of two groups of examinees Competent Non-competent • Comparison of the distribution of test scores for each examinee, who was classified by category • In the place of intersection of two distributions cut -score Powerpoint Templates Page 29

Powerpoint Templates Page 30

Powerpoint Templates Page 30

Advantages and disadvantages + - • Can be used with any kind of an

Advantages and disadvantages + - • Can be used with any kind of an item type • Classifying students on competent and non-competent is doubted to be objective Powerpoint Templates Page 31

Your questions? THANK YOU FOR ATTENTION Powerpoint Templates Page 32

Your questions? THANK YOU FOR ATTENTION Powerpoint Templates Page 32