RTI Measurement Overview Measurement Concepts for RTI Decision

MN RTI Center Training Modules o o This module was developed with funding from

Overview o o o Purpose(s) of assessment Characteristics of effective measurement for RTI Critical

Why Learn About Measurement? “In God we trust… All others must have data. ”

Assessment: One of the Key Components in RTI Curriculum and Instruction Assessment School Wide

Measurement and Assessment o Schools have to make many choices about measurement tools and

Some Purposes of Assessment o Screening o Diagnostic - instructional planning o Monitoring student

Screening o Standardized measures given to all students to: n n n Help identify

Diagnosis/Instructional Planning o Measures given to understand a student’s skill level (strengths and weaknesses)

Monitoring Student Progress (Formative) o o Informally this happens all the time and helps

Evaluation (Summative) o Measures used to provide a snapshot or summary of student skill

One Test Can Serve More Than One Purpose o To the extent a test

Activity o On Measurement Overview Purposes of Assessment Worksheet n Make a list of

Assessment Tools and Purpose(s) Name of Test Purpose(s) (Screening, Instructional Planning, Progress Monitoring, Program

Buyer Beware o Although it is good if a test can serve more than

Establishing a Measurement System o A core feature of RTI is identifying a measurement

Characteristics of An Effective Measurement System for RTI valid inexpensive reliable easily understood simple

Technical Characteristics of Measurement Tools o Reliability- the consistency of the measure n n

Why is Reliability so Important? o Assume you have a test that decides whether

Why is Reliability so Important? (Cont’d) o If the test is reliable, and you

Validity o o But what if the test IS reliable and you get a

Types of Validity o There are many ways to try to demonstrate validity: n

Types of Validity (Cont’d) o Content validity n o The test content is reasonable

Types of Validity (Cont’d) o Treatment Validity n If you use this test to

Types of Validity (Cont’d) o Construct Validity n Does the test measure theoretical trait

The NOT Validity Kind of Validity o Face validity is NOT really validity n

Reliability and Validity o Just because a test is reliable does not mean it

Measuring Reliability and Validity o o Typically reliability and validity evidence involves comparing the

How Reliable is Reliable Enough? o For important INDIVIDUAL decisions? r =. 90 o

How Valid is Valid Enough? Ranges Interpretation . 00 -. 20 Little/no validity .

Looking at Validity With a Purpose in Mind o o Predictive Validity is really

Validity isn’t Just About the Test o Validity has to do with the test

Validity isn’t Just About the Test (Cont’d) o Example 2: Letter Naming Fluency (LNF)

Test Utility o Is it easy to use, time efficient, and cheap? n Even

Test Utility (Cont’d) o Is it sensitive enough for the decisions you want to

Activity o On “Characteristics of Assessment Tools for RTI” Worksheet n n n Make

Characteristics of Assessment Tools for RTI Name of tool MN Rt. I Center Reliable

Some Help in Looking for Evidence o Measurement tools are reviewed at the following

Critical Features of Measurement and RTI o Screening o Progress Monitoring o Diagnostic Instructional

Measurement and RTI: Screening o o Reliability coefficients of at least r =. 80.

Measurement and RTI: Progress Monitoring o Reliability coefficients of r=. 90+ n o Because

Msrmnt & RTI: Progress Monitoring (Cont’d) o Test and scores are very sensitive to

Measurement and RTI: Diagnostic Assessment for Instructional Planning o o o Reliability coefficients of

Msrmnt & RTI: Diagnostic Assessment for Instructional Planning (Cont’d) o o o Many instructional

RTI, General Outcome Measures and Curriculum Based Measurement o Many schools use Curriculum Based

Why GOMs/CBM? o Typically meet the criteria needed for RTI screening and progress monitoring

GOM…CBM… DIBELS… AIMSweb… MN Rt. I Center DRAFT May 27, 2009 47

CBM Oral Reading Fluency ü ü ü Give 3 grade-level passages using standardized administration

Fluency and Comprehension The purpose of reading is comprehension A good measures of overall

The Importance of Multiple Sources of Information o o No ONE test is going

Articles Available with this Module n n Shoemaker, J. (2006). Reliability and Validity Stats

Recommended Resources o o American Psychological Association, American Educational Research Association, & National Council

Web Resource on Measurement o Heartland (Iowa) website link with powerpoints on common myths

RTI Related Resources o National Center on RTI o n RTI Action Network –

RTI Related Resources (Cont’d) o National Association of School Psychologists o o National Association

Quiz o 1. A purpose of assessment is what? n A. ) Screening n

Quiz o 3. The consistency of the measure is called its what? n A.

Quiz o True or False for each statement? n 5. ) Even if a

The End o Note: The MN RTI Center does not endorse any particular product.

Slides: 59

Download presentation

RTI Measurement Overview: Measurement Concepts for RTI Decision Making A module for pre-service and in-service professional development MN RTI Center Author: Lisa H. Stewart, Ph. D Minnesota State University Moorhead www. scred. k 12. mn. us click on RTI Center MN Rt. I Center 1

MN RTI Center Training Modules o o This module was developed with funding from the MN legislature It is part of a series of modules available from the MN RTI Center for use in preservice and inservice training: MN Rt. I Center 2

Overview o o o Purpose(s) of assessment Characteristics of effective measurement for RTI Critical features of measurement and RTI in the areas of screening, progress monitoring, and diagnostic instructional planning CBM/GOMs as a frequently used RTI measurement tool Multiple sources of information and convergence MN Rt. I Center

Why Learn About Measurement? “In God we trust… All others must have data. ” Dr. Stan Deno MN Rt. I Center 4

Assessment: One of the Key Components in RTI Curriculum and Instruction Assessment School Wide Organization & Problem Solving Systems (Teams, Process, etc) MN Rt. I Center 5 DRFT May 27, 2009 Adapted from Logan City School District, 2002

Measurement and Assessment o Schools have to make many choices about measurement tools and the process of gathering information used to make decisions (assessment) o We need different measurement tools for different purposes MN Rt. I Center

Some Purposes of Assessment o Screening o Diagnostic - instructional planning o Monitoring student progress (formative) o Evaluation (summative) MN Rt. I Center

Screening o Standardized measures given to all students to: n n n Help identify students at-risk in a PROACTIVE way Give feedback to the system about how students progress throughout the year at a gross (e. g. , 3 x per year) level o If students are on track in the fall are they still on track in the winter? o What is happening with students who started the year below target, are they catching up? Give feedback to the system about changes from year to year o Is our new reading curriculum having the impact we were expecting? MN Rt. I Center DRAFT May 27, 2009 8

Diagnosis/Instructional Planning o Measures given to understand a student’s skill level (strengths and weaknesses) help guide: n n Instructional grouping Where to place the student in the curriculum & curricular materials What skills are missing or weak and may need to be retaught or practiced and the level of support and explicitness needed Development or selection of curriculum and targeted interventions MN Rt. I Center

Monitoring Student Progress (Formative) o o Informally this happens all the time and helps teachers adjust their teaching on the spot More formalized progress monitoring involves standardized measures, tied to important educational outcomes, and given frequently (e. g. weekly) to: n n Prompt you to change what you are doing with a student if it is not working (formative assessment) so you are effective and efficient with your time and instruction Make decisions about instructional goals, materials, levels, and groups Aid in communication with parents Document progress for special education students as required for periodic and annual reviews MN Rt. I Center

Evaluation (Summative) o Measures used to provide a snapshot or summary of student skill at one particular point in time, often at the end of the instructional year or unit n o E. g. state high stakes tests "When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative. " MN Rt. I Center

One Test Can Serve More Than One Purpose o To the extent a test does more than one thing well, it is a more efficient use of student time and school resources n Example 1: Reading CBM measures of Oral Reading Fluency can be used for screening and progress monitoring n Example 2: the NWEA (MAP) test may be used for screening and instructional planning MN Rt. I Center

Activity o On Measurement Overview Purposes of Assessment Worksheet n Make a list of all the tests you have learned about or have seen used in the school setting (or are currently in use in your school) n Try to decide what purpose(s) each test served MN Rt. I Center

Assessment Tools and Purpose(s) Name of Test Purpose(s) (Screening, Instructional Planning, Progress Monitoring, Program Eval. ) MN Rt. I Center

Buyer Beware o Although it is good if a test can serve more than one purpose, just because a test manual or advertisement SAYS it is useful for multiple purposes, doesn’t mean the test actually IS useful for multiple purposes n Example: Many tests designed for diagnostic purposes or for summative evaluation state they are also useful for progress monitoring, but are too time consuming, too costly, too unreliable, or too insensitive to changes in student skills to be of practical use for progress monitoring MN Rt. I Center

Establishing a Measurement System o A core feature of RTI is identifying a measurement system n Screen large numbers of students o n n Identify students in need of additional intervention Monitor students of concern more frequently o 1 to 4 x per month o Typically weekly Diagnostic testing used for instructional planning to help target interventions as needed MN Rt. I Center 16

Characteristics of An Effective Measurement System for RTI valid inexpensive reliable easily understood simple can be given often quick sensitive to growth over short periods of time Credit: K Gibbons, M Shinn 17

Technical Characteristics of Measurement Tools o Reliability- the consistency of the measure n n If tested again right away or by a different person or with an alternate equivalent form of the test, the score should be similar Allows us to have confidence in the score and use the score to generalize what we see today to other times and situations o If a student knows how to decode simple words on a sheet of paper at 8 am this morning, we would expect him to be able to decode similar simple words at noon… and the next day… MN Rt. I Center

Why is Reliability so Important? o Assume you have a test that decides whether or not you need to take (and pay for) a remedial math class in college that does not count toward graduation. n n The test average score is 50 points. The test has a “cut off” score of 35, so students who score below 35 have to take the remedial class. MN Rt. I Center

Why is Reliability so Important? (Cont’d) o If the test is reliable, and you get a score of 30, if you take another version of the test or take the test again a week later (without major studying or changing what you know!) you would likely get a score very close to 30…. o If the test is not reliable, and you get a score of 30…You might be able to take the test again or take another version of the test and get a score of 40…or a score of 20! n If the test is unreliable we can’t have much faith in the score and it becomes difficult to use the test to make decisions! MN Rt. I Center

Validity o o But what if the test IS reliable and you get a score of 30 but your math skills are much better than the score implies? What if you get a score of 30 but you don’t really need a remedial math class? Then the test has an issue with VALIDITYn n n A test is valid only if the interpretation of the test scores are supported A common definition of validity is that “the test measures what it says it measures” Another definition is that a test is valid if it helps you make better decisions or leads to better outcomes than if you had never given the test MN Rt. I Center

Types of Validity o There are many ways to try to demonstrate validity: n Content validity n Criterion related validity: concurrent and predictive n Treatment Validity n Construct Validity MN Rt. I Center

Types of Validity (Cont’d) o Content validity n o The test content is reasonable Criterion related validity: two types n n Concurrent- the scores from this test are similar to scores from other tests that measure the same/similar thing Predictive- the test scores from this test do a pretty good job of letting us know what score a student will get on another test in the future MN Rt. I Center

Types of Validity (Cont’d) o Treatment Validity n If you use this test to decide about some treatment or intervention or instructional approach…. o o o Do you make better decisions? Do you have better goals? Planning? Student engagement? Most importantly: Are the outcomes for your students better? MN Rt. I Center

Types of Validity (Cont’d) o Construct Validity n Does the test measure theoretical trait or characteristic? o n E. g. If theory says children need to have a base of solid decoding skills before they will be fast and fluent readers of new text, do the scores on the reading test of decoding and fluency support that? All other ways to try to document validity are in some way also addressing construct validity (content, criterion, treatment, etc. ) MN Rt. I Center

The NOT Validity Kind of Validity o Face validity is NOT really validity n Positive: It “looks” good o n Negative: I just don’t like it o o Just because a test looks good or you (or your colleague) like to give it does not mean it gives you good information or is the best to use Just because a test isn’t set up exactly how you like it does not mean it does NOT give you good information Look for EVIDENCE of reliability and validity, don’t rely on your reaction, or the reactions and testimonials of colleagues, alone. MN Rt. I Center

Reliability and Validity o Just because a test is reliable does not mean it is valid n o o o It may reliably give you an inaccurate score! If a test is not reliable, it cannot be valid No test or test score is perfectly reliable We use test scores to help make a variety of decisions-- some “low stakes” and some “high stakes” decisions…. n n So how reliable is “reliable enough”? It depends …. MN Rt. I Center

Measuring Reliability and Validity o o Typically reliability and validity evidence involves comparing the test to itself or to other tests or outcomes The statistic used to sum up that comparison is often a correlation ( r ) Correlations vary from r = 0. 0 to 1. 0 The closer a correlation is to 1. 0 the “stronger” the relationship or the better you can predict one score or outcome if you know the other one MN Rt. I Center

How Reliable is Reliable Enough? o For important INDIVIDUAL decisions? r =. 90 o For SCREENING decisions? r =. 80 Salvia & Yselldyke, 2006 o “Reliability is like money, as long as you have it, it’s not a problem, but if you don’t, it’s a BIG problem!” ~ Fred Kurlinger MN Rt. I Center

How Valid is Valid Enough? Ranges Interpretation . 00 -. 20 Little/no validity . 21 -. 40 Below average validity Average validity . 41 -. 55. 56 -. 80 -. 99 MN Rt. I Center Above average validity Exceptional validity Source: Webb, MW, 1983 journal of reading, 26(5) 414 -424

Looking at Validity With a Purpose in Mind o o Predictive Validity is really important if you are using the test as a screening tool to predict which students are at risk or not at risk of reading difficulty Treatment validity is really important if you are using the test in an effort to lead to some sort of improved outcome MN Rt. I Center

Validity isn’t Just About the Test o Validity has to do with the test use and interpretation, so even a “valid” test can be used for the wrong reasons or misinterpreted or misused n n Example 1: A test score for an ELL student should reflect the student’s skills, not her ability to understand the directions and what is being asked Example 2 on next slide MN Rt. I Center

Validity isn’t Just About the Test (Cont’d) o Example 2: Letter Naming Fluency (LNF) n LNF involves giving a student a page of randomized upper and lower case letters and having the student name as many letters as they can in one minute. n As a test of early literacy, LNF has good reliability and concurrent and predictive validity, especially predictive validity n However, it can be easily MISUSED— o o If interpreted correctly, LNF can identify students at risk for early reading difficulty and get those students into well-rounded early literacy instruction well suited to them, BUT, if it is interpreted to mean that a student low in LNF needs to just have a lot of instructional time spent only learning letter names (often taking time away from high quality well-rounded early literacy instruction) it can actually have a negative impact. MN Rt. I Center

Test Utility o Is it easy to use, time efficient, and cheap? n Even if a test is reliable and valid, if it is too difficult to use, too time consuming, or too expensive it just won’t get used o n If a reliable and valid progress monitoring tool took 30 minutes per child and you wanted to monitor 10 students in your class every week, would you use it? However, if a test is easy and short and cheap… but isn’t reliable or valid… it’s still a waste of time, no matter how short! MN Rt. I Center

Test Utility (Cont’d) o Is it sensitive enough for the decisions you want to make? n Can it detect the differences between groups of kids or within an individual that you need to help you make a decision? o If a progress monitoring tool can only show gains of 1 point per month, is it sensitive enough to help give you timely feedback on the student’s response to your instruction? MN Rt. I Center

Activity o On “Characteristics of Assessment Tools for RTI” Worksheet n n n Make a list of tests you have learned about or have seen used in the school setting (or are currently in use in your school) o Can use all or some of the tools from the Purposes of Assessment Worksheet for your list Is the test reliable and valid FOR THE PURPOSE IT IS BEING USED? Is it quick and simple? Is it inexpensive? Can it be given often (has alternate forms, etc)? Is it sensitive? MN Rt. I Center

Characteristics of Assessment Tools for RTI Name of tool MN Rt. I Center Reliable Valid Quick & simple Cheap Can be given often Sensitive to growth over short time

Some Help in Looking for Evidence o Measurement tools are reviewed at the following sites: n n o o www. rti 4 success. org www. studentprogress. org These sites only review tests submitted, if it is not on the list it doesn’t mean it is bad, just that it wasn’t reviewed Be sure you know the purpose of assessment (screening, progress monitoring, etc) to best interpret the information MN Rt. I Center

Critical Features of Measurement and RTI o Screening o Progress Monitoring o Diagnostic Instructional Planning MN Rt. I Center 39

Measurement and RTI: Screening o o Reliability coefficients of at least r =. 80. Higher is better, especially for screening specificity. Well documented predictive validity Evidence the criterion (cut score) being used is reasonable and creates not too many false positives (students identified as at risk who aren’t) or false negatives (students who are at risk who aren’t identified as such) Brief, easy to use, affordable, and results/reports are accessible almost immediately MN Rt. I Center

Measurement and RTI: Progress Monitoring o Reliability coefficients of r=. 90+ n o Because you are looking at multiple data points over time, it is possible to use a test with a lower reliability (e. g. . 80 -. 90), but wait until you have several data points and use the combined data to increase confidence in your decisions Well documented treatment validity! MN Rt. I Center

Msrmnt & RTI: Progress Monitoring (Cont’d) o Test and scores are very sensitive to increases or decreases in student skills over time n o Evidence of what slope of progress (how much growth in a day, week or a month) is typical under what conditions can greatly increase your ability to make decisions VERY brief, easy to use, affordable, alternate forms, and results/reports are accessible immediately MN Rt. I Center

Measurement and RTI: Diagnostic Assessment for Instructional Planning o o o Reliability coefficients of r =. 80+ ASSUMING you are open to changing the instruction (formative assessment) if your planning didn’t work out as you thought it might Aligned with research on the development and teaching of reading Well documented treatment validity, utility for instructional planning! Time and cost efficient but specific enough to be useful for designing effective interventions Linked to standards and curriculum scope and sequence MN Rt. I Center

Msrmnt & RTI: Diagnostic Assessment for Instructional Planning (Cont’d) o o o Many instructional planning tools have limited information on reliability and validity—Look for tools that do have data. If creating your own tests, use best practices in test construction. Overall be sure you are doing standardized frequent progress monitoring and looking at student engaged time as other sources of information to ensure instruction is well planned. MN Rt. I Center

RTI, General Outcome Measures and Curriculum Based Measurement o Many schools use Curriculum Based Measurement (CBM) general outcome measures for screening and progress monitoring n o Most common CBM tool in Grades 1 - 8 is Oral Reading Fluency (ORF) n o You don’t “have to” use CBM, but many schools do Measure of reading rate (# of words correct per minute on a grade level passage) and a strong indicator of overall reading skill, including comprehension Early Literacy Measures are also available such as Nonsense Word Fluency (NWF), Phoneme Segmentation Fluency (PSF), Letter Name Fluency (LNF) and Letter Sound Fluency (LSF) MN Rt. I Center 45

Why GOMs/CBM? o Typically meet the criteria needed for RTI screening and progress monitoring n n o Reliable, valid, specific, sensitive, practical Also, some utility for instructional planning (e. g. , grouping) They are INDICATORS of whethere might be a problem, not diagnostic! n n n Like taking your temperature or sticking a toothpick into a cake Oral reading fluency is a great INDICATOR of reading decoding, fluency and reading comprehension Fluency based because automaticity helps discriminate between students at different points of learning a skil MN Rt. I Center 46

GOM…CBM… DIBELS… AIMSweb… MN Rt. I Center DRAFT May 27, 2009 47

CBM Oral Reading Fluency ü ü ü Give 3 grade-level passages using standardized administration and scoring; use median (middle) score 3 -second rule (tell the student the word & point to next word) Discontinue rule (after 0 correct in first row, if <10 correct on 1 st passage do not give other passages) MN Rt. I Center Errors Not Errors Hesitation for >3 seconds Incorrect pronunciation for context Omitted Words out of order Repeated Sounds Self-Corrects Skipped Row Insertions Dialect/Articulation 48

Fluency and Comprehension The purpose of reading is comprehension A good measures of overall reading proficiency is reading fluency because of its strong correlation to measures of comprehension. MN Rt. I Center

The Importance of Multiple Sources of Information o o No ONE test is going to serve all purposes or give you all the information you need. Use MULTIPLE sources of data to make the best decisions n n n o Screening, progress monitoring, diagnostic, and evaluative data from multiple sources and/or across time Teacher observation and more formal observations Other pieces of relevant information such as behavior, attendance, health, the curriculum and instructional environment, etc. Look for CONVERGENCE of data- places where several sources of data point to the same decision or conclusion MN Rt. I Center

Articles Available with this Module n n Shoemaker, J. (2006). Reliability and Validity Stats “crib sheet” from Heartland AEA (Iowa) Traditional and Modern Concepts of Validity. ERIC/AE Digest Also see articles specific to particular uses of measurement in benchmark and progress monitoring modules MN Rt. I Center 51

Recommended Resources o o American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Educational Measurement Text, e. g. texts by Hogan, Marzano, or Salvia & Ysseldyke, or a good Educational Psychology text that covers reliability, validity and utility of measurement MN Rt. I Center

Web Resource on Measurement o Heartland (Iowa) website link with powerpoints on common myths and confusions about assessment n http: //www. aea 11. k 12. ia. us/assessment/myth buster. html MN Rt. I Center

RTI Related Resources o National Center on RTI o n RTI Action Network – links for Assessment and Universal Screening o o http: //www. scred. k 12. mn. us/ and click on link National Center on Student Progress Monitoring o o http: //www. rtinetwork. org MN RTI Center o o http: //www. rti 4 success. org/ http: //www. studentprogress. org/ Research Institute on Progress Monitoring http: //progressmonitoring. net/ o. Rt. I Center MN 54

RTI Related Resources (Cont’d) o National Association of School Psychologists o o National Association of State Directors of Special Education (NADSE) o o www. nasdse. org Council of Administrators of Special Education o o www. nasponline. org www. casecec. org Office of Special Education Programs (OSEP) toolkit and RTI materials o http: //www. osepideasthatwork. org/toolkit/ta_responsiveness_in tervention. asp MN Rt. I Center DRAFT May 27, 2009

Quiz o 1. A purpose of assessment is what? n A. ) Screening n B. ) Diagnostic n C. ) Progress Monitoring n D. ) Evaluation n E. ) All of the above o 2. True or False? A test is useful for multiple purposes as long as its manual or advertisement says it is. MN Rt. I Center DRAFT May 27, 2009

Quiz o 3. The consistency of the measure is called its what? n A. ) Validity n B. ) Reliability C. ) Criterion D. ) Sensitivity n n o 4. If the test measures the construct it says it measures it has? n A. ) Validity n B. ) Reliability C. ) Criterion D. ) Sensitivity MN Rt. I Center n n

Quiz o True or False for each statement? n 5. ) Even if a test is not valid, it can still be reliable. n 6. ) Even if a test is not reliable, it can still be valid. n 7. ) Validity is not just about the test—it has to do with the test use and interpretation, so even a valid test can be used for the wrong reasons, misinterpreted, or misused. MN Rt. I Center

The End o Note: The MN RTI Center does not endorse any particular product. Examples used are for instructional purposes only. o Special Thanks: n n Thank you to Dr. Ann Casey, director of the MN RTI Center, for her leadership Thank you to Aimee Hochstein, Kristen Bouwman, and Nathan Rowe, Minnesota State University Moorhead graduate students, for editing, writing quizzes, and enhancing the quality of these training materials MN Rt. I Center