Report on the CEFR ACTFL ILR STANAG Alignment

CEFR / ACTFL / ILR /STANAG “Alignment Conference” • Goal (s) of Conference •

About the Conference Taken from http: //www. uni-leipzig. de/actflcefr 2010 • The goal of

About the Conference (cont’d) • Both sets of scales claim to measure the same

Further very ambitious goals included the following: • to present and discuss: • empirical

Further very ambitious goals included the following: • to develop guidelines for developing tests

The ‘Alignment Conference’ • 4 parallel workshops • Opening addresses • Opening plenary presentations:

The ‘Alignment Conference’ 12 papers organized by topics: • Topic 1: Conceptual issues in

Test Equivalence, Equating, and Linking: The Issue of Validity Olaf Bärenfänger Universität Leipzig baerenfaenger@uni-leipzig.

Test Equating and Linking 1. Some Conceptual Clarifications 2. Validity as Core of Equivalence

Does Linking Really Only Mean to Correlate Test Scores? 1. Some Conceptual Clarifications 2.

Suggestions for a Linking Argument 1. Some Conceptual Clarifications 2. Validity as Core of

Some Conclusions 1. Some Conceptual Clarifications § Things are far more complicated than assumed

CAN ACTFL/ILR AND CEFR BE ALIGNED FOR SPEAKING ASSESSMENT? Pardee Lowe, Jr. U. S.

POSSIBILITIES • • COMPARE? RELATE? ALIGN? EQUATE? • For the last two: Does one

QUESTIONS? • THE FOLLOWING DISCUSSES SIMILARITIES AND DIFFERENCES BETWEEN ACTFL/ILR AND CEFR • PRESENTED

SIMILARITIES • • WORK IN PROGRESS PROSE DESCRIPTIONS HIERARCHICAL CRITERION-REFERENCED CAN-DO STATEMENTS LEVELS VS

CEFR BANDS • • • HOW MANY BANDS ARE THERE? WHAT DOES “BAND” MEAN?

MAJOR PARAMETERS FOR ACTFL/ILR & CEFR ALIGNMENT ACTFL RELATIONAL FRAME • FIXED RELATIONAL MODEL

QUESTIONS? • HOW MANY OF THESE ACTFL/ILR FEATURES OCCUR IN CEFR? • ARE THEY

Framing research to develop guidelines for developing tests that can be rated according to

What the CEFR knows about assessing high level English writing Liz Hamp-Lyons

What ACTFL knows about assessing high level English writing “It must be noted that

ACTFL Superior Level writing (ILR Levels 3 and above) SUPERIOR Writers at the Superior

What the ILR knows about high level English writing Writing 5 (Functionally Native Proficiency):

Are the two frameworks intended to be, or indeed claimed to be, equivalent? They

“The successful establishment of equivalencies would support the validity of both scales. ” OK…

Framing… to develop guidelines for assessing writing at the higher levels Guidelines should Begin

Framing… to develop guidelines for assessing writing at the higher levels Guidelines should Continue

Framing… to develop guidelines for assessing writing at the higher levels Guidelines should also

Framing… to develop guidelines for assessing writing at the higher levels Guidelines also need

A safe crosswalk? ? ? Where we are TODAY!

Considerations • Can BILC Study groups tackle these complex and complicated issues? •

The Way Ahead? • Another conference is being planned for next year. • Setting

Communicative proficiency and linguistic development: intersections between SLA and language testing research • http:

Slides: 39

Download presentation

Report on the CEFR /ACTFL / ILR & STANAG “Alignment Conference” June 30 - July 03, 2010 Leipzig, Germany Julie J. Dubeau, M. A. BILC Secretary Varna, Bulgaria October 14, 2010

CEFR / ACTFL / ILR /STANAG “Alignment Conference” • Goal (s) of Conference • Some Perspectives Presented • Some Preliminary Questions • Some Preliminary Conclusions

About the Conference Taken from http: //www. uni-leipzig. de/actflcefr 2010 • The goal of the ACTFL / CEFR Alignment Conference 2010 is to bring together some 45 leaders in the field from both Europe and North America to explore a crosswalk between the ACTFL Proficiency Guidelines and the Common European Framework of Reference for Languages (CEFR) and to establish equivalencies on theoretical and empirical grounds.

About the Conference (cont’d) • Both sets of scales claim to measure the same construct: proficiency. The successful establishment of equivalencies would support the validity of both scales. • Problems in establishing equivalencies would point to the need for further research and development.

Further very ambitious goals included the following: • to present and discuss: • empirical studies on the validity and reliability of tests based on either framework; • theoretical studies of the construct validity of either framework; and empirical studies comparing both frameworks and/or tests based on both frameworks; • to present and critically discuss: • standardized tests (test systems) based on either framework; and for different target groups (age, education, professional purposes, etc. );

Further very ambitious goals included the following: • to develop guidelines for developing tests that can be rated according to both scales; and • to develop guidelines for developing proficiency tests, their administration, and evaluation. • These goals will be accomplished by combining general session presentations with break-out discussion groups. Oh My!!

The ‘Alignment Conference’ • 4 parallel workshops • Opening addresses • Opening plenary presentations: – It’s easier to malign tests than to align tests • Ray Clifford – The CEFR: An evolving framework of reference • Nick Saville See abstracts on website

The ‘Alignment Conference’ 12 papers organized by topics: • Topic 1: Conceptual issues in a crosswalk • Topic 2: Technical issues across domains • Topic 3: Research and empirical issues in a crosswalk • 3 breakout sessions each with focus either on purposes and benefits, skills and domains, and on issues such as intercultural competence, implications for younger learners and language policy and curricular issues. • http: //www. uni-leipzig. de/actflcefr 2010/abstracts. html

Test Equivalence, Equating, and Linking: The Issue of Validity Olaf Bärenfänger Universität Leipzig baerenfaenger@uni-leipzig. de Prof. Olaf Bärenfänger Universität Leipzig Slides used with permission

Test Equating and Linking 1. Some Conceptual Clarifications 2. Validity as Core of Equivalence 3. Conclusions Equating adjusts for differences in difficulty, not differences in content. <. . . > in most cases, (different) tests clearly measure very different content/constructs. We refer generically to a relationship between scores on such tests as linking. <. . . > it is virtually certain that score differences are attributable to construct differences as well as to errors of measurement, either or both of which could be quite large. With equal force, however, the adequacy of the linking may be highly suspect depending on the nature of the decisions made based on the linking. (Kolen & Brennan 20042: 423 f. ) Olaf Bärenfänger

Does Linking Really Only Mean to Correlate Test Scores? 1. Some Conceptual Clarifications 2. Validity as Core of Equivalence 3. Conclusions Linking = Comparing the validity of two different test and adjusting for difficulty „Validity is an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness and actions based on test scores or other modes of assessment“. (Messick 1989: 13) „even for purposes of applied decision making, reliance on criterion validity or content coverage is not enough“. (Messick 1989: 17) Olaf Bärenfänger

Suggestions for a Linking Argument 1. Some Conceptual Clarifications 2. Validity as Core of Equivalence 3. Conclusions Step 1: Gather and compare all kind of evidential information about the two tests, e. g. § Constructs § Internal validity (e. g. difficulty, discrimination, estimation of reliability, SEM, factor analysis, qualitative analyses, G Theory, IRT) § External validity (e. g. correlation/regression studies, anchoring, test calibration with IRT, linking through experts‘ judgements) § Content relevance (quality of relation between test domain and real life domain) § Content representativeness (quantity of relation between test domain and real life domain) § Process analyses Olaf Bärenfänger Test Interpretation Test Use Evidential Basis Construct validity + Relevance/utility Consequential Basis Value implications Social consequences

Some Conclusions 1. Some Conceptual Clarifications § Things are far more complicated than assumed initially. 2. Validity as Core of Equivalence linking is more than mere concurrent validity. 3. Conclusions § When the goal is to link two tests, we need to be aware that § Linking is essentially an issue of validity. § A venue might be to make use of an equivalence argument as suggested. § In order to pursue this goal, more collaboration between researchers and test institutions is needed as well as an agreement on the details of an equivalence argument. § It is probably still a long way until we have safe crosswalks between different test systems. Olaf Bärenfänger

CAN ACTFL/ILR AND CEFR BE ALIGNED FOR SPEAKING ASSESSMENT? Pardee Lowe, Jr. U. S. Gov. Interagency Language Roundtable Slides used with permission

POSSIBILITIES • • COMPARE? RELATE? ALIGN? EQUATE? • For the last two: Does one need the same “construct”? Pardee Lowe, Jr.

QUESTIONS? • THE FOLLOWING DISCUSSES SIMILARITIES AND DIFFERENCES BETWEEN ACTFL/ILR AND CEFR • PRESENTED ARE THOSE ASSESSMENT FEATURES WHICH ALLOW ACTFL/ILR TO BE USED FOR EXAMINATIONS IN SPEAKING • THE QUESTION TO CEFR ADEPTS? – HOW MANY OF THESE FEATURES OCCUR IN CEFR? – IN WHAT WAY(S)? Pardee Lowe, Jr.

SIMILARITIES • • WORK IN PROGRESS PROSE DESCRIPTIONS HIERARCHICAL CRITERION-REFERENCED CAN-DO STATEMENTS LEVELS VS BANDS PLUS LEVELS VS PLUS BANDS Pardee Lowe, Jr.

CEFR BANDS • • • HOW MANY BANDS ARE THERE? WHAT DOES “BAND” MEAN? BASKET ANALOGY? IS IT A RANGE? DOES IT HAVE HEIGHT? DEPTH? HOW ARE QUALITY AND QUANTITY ACCOUNTED FOR WITHIN A BAND? • HOW MANY TASKS IN EACH BAND? • HOW ARE “BANDS” ASSIGNED? – PARTICULARLY “PLUS BANDS”? ? Pardee Lowe, Jr.

MAJOR PARAMETERS FOR ACTFL/ILR & CEFR ALIGNMENT ACTFL RELATIONAL FRAME • FIXED RELATIONAL MODEL • NATIVE SPEAKER FEATURES • A CORE PER LEVEL BOUNDARIES • DELINEATED CEFR • FLEXIBLE • MODEL LEARNER? • RICH NUMBER • BASKETS Pardee Lowe, Jr.

QUESTIONS? • HOW MANY OF THESE ACTFL/ILR FEATURES OCCUR IN CEFR? • ARE THEY EMPLOYED IN THE SAME WAY? • IF NOT, HOW WOULD ACTFL/ILR AND/OR CEFR HAVE TO BE ALTERED TO ACHIEVE ALIGNMENT? Pardee Lowe, Jr.

Framing research to develop guidelines for developing tests that can be rated according to both scales: The case of writing Liz Hamp-Lyons University of Nottingham, UK/University of Hong Kong Slides used with permission

What the CEFR knows about assessing high level English writing Liz Hamp-Lyons

What ACTFL knows about assessing high level English writing “It must be noted that the Superior level encompasses levels 3, 4, and 5 of the ILR scale. However, the abilities at the Superior level described in these guidelines are baseline abilities for performance at that level rather than a complete description of the full range of Superior. ” Liz Hamp-Lyons

ACTFL Superior Level writing (ILR Levels 3 and above) SUPERIOR Writers at the Superior level are able to produce most kinds of formal and informal correspondence, complex summaries, precis, reports, and research papers on a variety of practical, social, academic, or professional topics treated both abstractly and concretely. They use a variety of sentence structures, syntax, and vocabulary to direct their writing to specific audiences, and they demonstrate an ability to alter style, tone, and format according to the specific requirements of the discourse. These writers demonstrate a strong awareness of writing for the other and not for the self. See p. 24 of handout booklet for full description. Liz Hamp-Lyons

What the ILR knows about high level English writing Writing 5 (Functionally Native Proficiency): Has writing proficiency equal to that of a well educated native. Without non-native errors of structure, spelling, style or vocabulary can write and edit both formal and informal correspondence, official reports and documents, and professional/ educational articles including writing for special purposes which might include legal, technical, educational, literary and colloquial writing. In addition to being clear, explicit and informative, the writing and the ideas are also imaginative. The writer employs a very wide range of stylistic devices. Liz Hamp-Lyons

Are the two frameworks intended to be, or indeed claimed to be, equivalent? They are stylistically different They are strikingly different in length ACTFL descriptors are a mix of ‘can-do’s’, personal attributes (“they are able to…”), text characteristics… CEFR descriptors are superficially ‘cando’s’ (“Can express him/herself with clarity and precision relating to the addressee flexibly and effectively. ”) but in fact are far too vague to be useable as they stand Liz Hamp-Lyons

“The successful establishment of equivalencies would support the validity of both scales. ” OK… We are not there yet We may not be on the track to get there What CAN we do? Liz Hamp-Lyons

Framing… to develop guidelines for assessing writing at the higher levels Guidelines should Begin from a construct The construct needs A theory of second language acquisition A theory of learning A theory of written language mastery trajectories An empirical model of written language use at different levels An argument (a) language use argument (a) assessment use argument that will weave all these dimensions together appropriately for different audiences/clients Liz Hamp-Lyons

Framing… to develop guidelines for assessing writing at the higher levels Guidelines should Continue by stipulating the need to obtain perceptions/judgements from a range of stakeholders Language testing specialists Teachers of the area under study Students of the area under study Score users Test developers/item writers Liz Hamp-Lyons

Framing… to develop guidelines for assessing writing at the higher levels Guidelines should also Propose a specification study comparing/ contrasting tasks, input texts, level descriptors across all components of each test (for which the “same construct” is being claimed( Propose a text linguistic study such as corpus analysis of tasks for difficulty specification; or discourse analysis of what persons scoring specified levels on each test can do within the domain. Liz Hamp-Lyons

Framing… to develop guidelines for assessing writing at the higher levels Guidelines also need to Stipulate the need for rigorous quantitative studies to test hypotheses deriving from the construct definition and qualitative elicitation stages Propose consequential validity processes to check the impact of any emerging conclusions on teachers, learners and other stakeholders outside the testing and research enterprises Liz Hamp-Lyons

A safe crosswalk? ? ? Where we are TODAY!

Considerations • Can BILC Study groups tackle these complex and complicated issues? • Is the issue about comparing/aligning STANAG/CEFR scales? • Can we reconcile scale orientations? • Each test would have to be linked to a test derived from the other scale! • Generalizations cannot be made today. Will they ever be reliable?

The Way Ahead? • Another conference is being planned for next year. • Setting a research agenda will be critical • BILC will be monitoring & reporting back and continue to encourage dialogue & research among our constituents • What can you do? ?

Communicative proficiency and linguistic development: intersections between SLA and language testing research • http: //eurosla. org/monographs/EM 01 home. html • It is an edited volume by the SLATE network (Second Language Acquisition and Testing in Europe), a group of SLA researchers and language testers which aims to explicate the CEFR in various languages informed by SLA and language assessment research. http: //www. slategroup. eu/ • The introductory chapter by Hulstijn et al. , in particular, offers a useful summary of the origins of the CEFR and raises a number of issues also discussed in the “Leipzig’ conference (e. g. the origins of the CEFR in the Threshold series as well as in some North American scales including ACTFL, FSI and ILR, see p. 14; the suitability of the CEFR for language assessment etc. ). • 2010 ACTFL Convention and World Languages Expo at the Hynes Convention Center in Boston, MA from November 19 -21, 2010 (Preconvention workshops, Thursday, November 18). www. actfl. org