Thinking About the ThinkAloud Method to Guide Development

Thinking About the Think-Aloud Method to Guide Development of Assessments of Modified Achievement Standards Presentation at the OSEP GSEG Project Directors Conference Steve Ferrara January 16, 2008

Today l l l No one right way to do cognitive labs (think-alouds) Background on cognitive labs Overview and selected issues in conducting cognitive labs l l Do not cover data analysis, synthesis, and interpretation and use Adapting cognitive labs for our target students and for assessments of modified achievement standards l I don’t know how you plan to use cognitive labs in the process of developing and validating items, so I have planned some general comments Think-Alouds and Modified Assessments 2 Steve Ferrara

Two general principles l As with most research methodologies, there is no one right way to do cognitive labs l l Think-alouds used in reading comprehension research, survey item development, achievement item development and validation, human factors research and evaluation (e. g. , usability studies), … There are principles and practices that enable us to produce verbal data that can be interpreted reliably and validly and that is practically useful Think-Alouds and Modified Assessments 3 Steve Ferrara

So what are cognitive labs? l l l (a) Prompting of a specified population of respondents to (b) think out loud to (c) illuminate how they think while they (d) perform specified tasks Simply put, we ask respondents to “think out loud” while they perform a task Think-alouds can be used for physical or cognitive tasks Think-Alouds and Modified Assessments 4 Steve Ferrara

Example: think-aloud prompt l l I want you to say out loud anything that you are thinking while you are reading and trying to answer these science questions. Some things you might be feeling include. . . (hungry, tired, bored; this is interesting, hard). You might think things like. . . (this is a biology question; I don’t understand the question; I’m going to reread this part; we did this in class last year). Say anything you are thinking to yourself…What I’m most interested in is the stuff you are doing in your head – while you are answering the question – that helps you to understand the question and figure out the answer. (From the ICV project; see Ferrara, Duncan, et al. , 2004) Think-Alouds and Modified Assessments 5 Steve Ferrara

Thinking aloud l In verbal reporting, respondents: l l l Bring information into attention (When necessary) convert the information into verbalizable code Vocalize their thinking (Ericsson & Simon, 1993, p. 16) l Crucial considerations: l l Are respondents aware of the information they use during task completion? Can they verbalize it? Think-Alouds and Modified Assessments 6 Steve Ferrara

Why do cognitive labs? l l In general, to illuminate respondents’ cognitive processing while they perform a task Exploratory goals l l e. g. , What reading comprehension strategies do students use when…? Refinement goals l e. g. , Improve clarity and fidelity of interpretation of survey items (Desimone & Le. Floch, 2004) l Validation goals l e. g. , Ensure that achievement items elicit intended knowledge, skills, and processes (Ferrara, Duncan, et al. , 2004; Leighton, 2004) Think-Alouds and Modified Assessments 7 Steve Ferrara

Rough history l l l l Early empiricists in psychology: introspection Behaviorism in 1930 s: introspection fell into ill repute Surveys and polls: error (e. g. , 1948 presidential election) and new interest in wording of questions and tasks New Society evaluations in 1960 s: studies of behavior rather than opinion Sudman and Bradburn in 1970 s: largest effect on responses was tasks (e. g. , wording), not interviewers or respondents Emergence of cognitive psychological research in 1970 s and beyond: implications for survey development 1990 s: NCES surveys and Voluntary National Test reading and mathematics items in 1997 -2000, reading research, SEPT, etc. 2000 s: Expanding application to educational achievement test items Think-Alouds and Modified Assessments 8 Steve Ferrara

Overview and selected issues in conducting cognitive labs l l l Retrospective and concurrent Open-ended and moderately or highly focused (specificity of tasks, uses of probes) Respondent sampling and generalizability Task difficulty Respondent willingness and effectiveness in thinking aloud and verbalizing Think-Alouds and Modified Assessments 9 Steve Ferrara

Concurrent and retrospective think-alouds l l Describing thinking during task completion or after task completion Trade-offs l l Thinking aloud may alter task performance Recall and reconstruction may differ from the thinking that actually occurred Think-Alouds and Modified Assessments 10 Steve Ferrara

Open-ended and moderately or highly focused l Open-ended and exploratory l l Moderately focused l l “Please think out loud while you respond to the following items. ” “…Remember to tell me what you think about when you respond. Tell me about how you understand the item, how you select a response, and how you know which response is correct. ” Probes l “How did you decide to select that response? What information in the item and from school did you use? ” Think-Alouds and Modified Assessments 11 Steve Ferrara

Grade 6 science item Think-Alouds and Modified Assessments 12 Steve Ferrara

Illustrative verbal reports l Thinking aloud l l “A. The candy will reach Bill. No that can’t be right…B. The candy will go behind Bill…The possibility is the candy wouldn’t reach Bill and then it probably will drop. ” Response to a retrospective probe (“How did you get that answer? ”) l “Because I looked and if it’s, well, how it’s going around and he’s going forward so…” Think-Alouds and Modified Assessments 13 Steve Ferrara

Respondent sampling and generalizability l Number of respondents l l l Rule of thumb: 9 respondents Internal studies in usability testing suggest that little new information is gained after ~9 respondents Representativeness of the population of inference l l 9 for each key subgroup in the population or 9 total? Typical subpopulations (e. g. , racial-ethnic, gender) or those more likely to be relevant (e. g. , instructional program, gender)? Often, we don’t know enough about which subpopulations may process differently Cost affordability is a consideration Think-Alouds and Modified Assessments 14 Steve Ferrara

Task sampling and generalizability l l l All items, a random sample of items, or exemplars from item subsets (e. g. , item families)? We may not know enough about the tasks (e. g. , item families) to sample effectively (and that’s why we’re conducting cog labs) Numbers of respondents, time per respondent, and cost affordability are considerations Think-Alouds and Modified Assessments 15 Steve Ferrara

Task difficulty l Rule of thumb: l l l Select tasks that are moderately difficult for respondents Alternately, select respondents that are well matched to the tasks Consideration: Select tasks in a range of difficulties l l Some respondents can verbalize about easy and routinized tasks Some respondents can verbalize about their thinking, even for tasks that are too difficult or that they don’t know about Think-Alouds and Modified Assessments 16 Steve Ferrara

Respondent willingness and effectiveness in thinking aloud and verbalizing l Some respondents are reticent l l Think of middle-schoolers Think of lower achievers (e. g. , Ferrara, Albert, et al. , 1996) l l Some respondents are unaware of their thinking (i. e. , what information they heed) Some respondents may be willing and aware, but do not verbalize their processing in illuminating or useful ways… Think-Alouds and Modified Assessments 17 Steve Ferrara

Target students for assessments of modified achievement standards l l Pursuing grade-level content standards but may not progress at the same rate as their peers May not be comfortable about verbalizing about what they know and don’t know, may not be particularly metacognitive (i. e. , aware), may not verbalize effectively Think-Alouds and Modified Assessments 18 Steve Ferrara

Example from a GSEG project (implications to follow) l l Ohio, Minnesota, Oregon, AIR Persistently low-performing SWDs (PLP/SWD) l l Borrowed PLP idea from earlier Georgia work Students in the lowest achievement level for two or three years (depends on the student cohorts) Think-Alouds and Modified Assessments 19 Steve Ferrara

Example (cont. ) l Reading items that function adequately for PLP/SWDs—psychometric definition l l l P values. 4 -. 6 Point biserials and polyserials GE. 20 And items that did not function well Try to determine what distinguishes adequately functioning items and how to make other items more psychometrically sound for PLP/SWDs (One of several research activities to identify item and test modifications to provide valid and accessible items for PLP/SWDs) Think-Alouds and Modified Assessments 20 Steve Ferrara

Example (cont. ) l l Initial findings for one cohort Adequately functioning items l l Little and no inference required by comprehension items Vocabulary items: Definitions in the text You can put your finger on the answer in the passage Other items l Inference and synthesis required Think-Alouds and Modified Assessments 21 Steve Ferrara

Further… l l The project will define eligibility, in part by identifying students “at the bottom of” gradelevel assessments and “at the top of” alternate assessments Think about trying to get verbal data from students who currently participate in alternate assessments Think-Alouds and Modified Assessments 22 Steve Ferrara

Implications for cog labs with target students l l l They probably are fairly concrete thinkers They are not likely to be highly verbal (in the colloquial sense) They may be reticent They may not be highly metacognitive (i. e. , aware) How much useful verbal data might we expect to get from students who are likely to be eligible for assessments of modified achievement standards? Some encouraging results regarding think-alouds with students with learning disabilities (Johnstone, Liu, Altman, Thurlow, 2007) Think-Alouds and Modified Assessments 23 Steve Ferrara

This is not an argument against using cog labs in this situation l l But choose the items (and other assessment tasks), students, and think-aloud prompts and probes with the target students clearly in mind Also, maybe consider a new idea: group cognitive labs l l As far as I know, this idea has not been proposed elsewhere Think of it as focus groups where the focus is cognitive processing while responding to test items Think-Alouds and Modified Assessments 24 Steve Ferrara

Group cognitive labs idea l l 4 -6 respondents per group Probably homogeneous in terms of achievement, verbalization, etc. l l l OTL is an important consideration Get diversity and generalizability across groups Could matrix sample items so that respondents do individual think-alouds for 2 -3 items Think-Alouds and Modified Assessments 25 Steve Ferrara

Group cognitive labs idea (cont. ) l l Retrospective reports Training and practice: l l Thinking aloud Reporting similarities and differences in thinking of other respondents Avoiding being unduly influenced by others’ thinking Round-robin think-alouds l l l Respondent A thinks aloud for item 1 Other respondents report similarities and differences Etc. Think-Alouds and Modified Assessments 26 Steve Ferrara

Group cognitive labs idea (cont. ) l Possible advantages l l Cost-efficiency Broader sampling Possible improvement in quantity and quality of verbal reports Possible drawbacks l l Increase in reticence Respondents influence each other, obscure individual processing, and pollute verbal reports Think-Alouds and Modified Assessments 27 Steve Ferrara

Good luck! Steve Ferrara CTB Mc. Graw-Hill sferrara 1951@gmail. com Think-Alouds and Modified Assessments 28 Steve Ferrara

References Desimone, L. M. , & Le Floch, K. C. (2004). Are we asking the right questions? Using cognitive interviews to improve surveys in education research. Educational Evaluation and Policy Analysis, 26(1), 1 -22. Ericsson, K. A. , & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. (rev. ed. ). Cambridge, MA: The MIT Press. Ferrara, S. , Albert, F. , Gilmartin, D. , Knott, T. , Michaels, H. , Pollack, J, Schuder, T. , Vaeth, R. , & Wise, S. (1996, April). A qualitative study of the information examinees consider during item review on a computer-adaptive test. In L. Wolf, (Moderator), Item review in computerized adaptive testing. Symposium conducted at the annual meeting of the National Council on Measurement in Education, New York. Ferrara, S. , Duncan, T. G. , Freed, R. , Velez-Paschke, A. , Mc. Givern, J. , Mushlin, S. , Mattessich, A. , Rogers, A. , & Westphalen, K. (2004). Examining test score validity by examining item construct validity: Preliminary analysis of evidence of the alignment of targeted and observed content, skills, and cognitive processes in a middle school science assessment. Paper presented at the annual meeting of the American Educational Research Association, San Diego. Johnstone, C. , Liu, K. , Altman, J. , & Thurlow, M. (2007). Student think aloud reflections on comprehensible and readable assessment items: Perspectives on what does and does not make an item readable. (Technical Report 48. ) Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: the collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23(4), 6 -15. Think-Alouds and Modified Assessments 29 Steve Ferrara