An introduction to Language Testing Evaluation and Assessment

  • Slides: 62
Download presentation
An introduction to Language Testing Evaluation and Assessment in Language Education Session 5 Dr

An introduction to Language Testing Evaluation and Assessment in Language Education Session 5 Dr Kia Karavas

Issues to be discussed • • Assessment for and of learning Why do we

Issues to be discussed • • Assessment for and of learning Why do we need to assess learners? Benefits and drawbacks of testing Qualities of a good test Classification of tests Developing your own test Designing test items (MC, T/F, short answer, matching etc)

Assessment and testing • Assessment: any evaluation of a student’s work • Assessment involves

Assessment and testing • Assessment: any evaluation of a student’s work • Assessment involves collecting of information or evidence of a learner’s learning progress and achievement over a period of time for the purposes of improving teaching and learning.

Why do we need to assess students? administrators They need to know whether the

Why do we need to assess students? administrators They need to know whether the programmes they have planned are working well. The only way to do this is to discover how well the students are doing with their courses. teachers Teachers put the administrators’ plans into practice. they need to know what has been done and what needs to be done next; what the students already know or can do and what they do not know or cannot do yet.

Why do we need to assess students? parents They are anxious to know how

Why do we need to assess students? parents They are anxious to know how their children are doing in school. Unable to watch their kids in the class, parents value the feedback about their children’s performance from the teachers and the school.

Why do we need to assess students? students They need to know what they’ve

Why do we need to assess students? students They need to know what they’ve accomplished, be aware of what they need to work on next, and build up their confidence and satisfaction from what they have achieved.

Assessment for learning/ of learning: Both are important, but they are not the same

Assessment for learning/ of learning: Both are important, but they are not the same thing Assessment for learning Assessment of learning • • • main purpose of assessment is not just to determine a mark at a particular point in time but to improve student learning. on-going help teachers determine how student learning is developing, to identify what’s working and where problems are assesses the effectiveness of instruction as well and helps teachers determine how instruction has to change to help struggling students. • • • focuses on evaluation and involves testing traditional approach, typically using examinations to test what students know and are able to do. Through formal testing, teachers report on what students have learned at the end of a unit or course. It is summative, and it is what teachers are most familiar with. The teacher’s role in evaluation is as judge to assign grades to report to students and their parents how they did

Assessment for learning: methods Teacher’s observations of the learners’ overall performance or achievement can

Assessment for learning: methods Teacher’s observations of the learners’ overall performance or achievement can be quite accurate and fair. Self-assessment and peer assessment Project work requires students to complete a set of tasks designed to explore a certain idea or concept. Portfolios A portfolio is a purposeful collection of materials assembled over a period of time by a learner to provide evidence of skills, abilities and attitudes.

What is a test? All tests are assessments – but not all assessments are

What is a test? All tests are assessments – but not all assessments are tests It often takes the ‘pencil and paper’ form and it is usually done at the end of a learning period, such as unit-test, midterm-test, semester-test etc. A definition: a method of measuring a person’s ability, knowledge, or performance in a given domain or an instrument for measuring language ability

So… • Assessment includes testing but definitely not only testing. • All tests are

So… • Assessment includes testing but definitely not only testing. • All tests are assessments – but not all assessments are tests

Arguments against testing • Some SS become so nervous that they can't perform and

Arguments against testing • Some SS become so nervous that they can't perform and don't give a true account of their knowledge or ability • Other SS can do well despite not having worked throughout the course • Once the test has finished, students can just forget all that they had learned • SS become focused on passing tests rather than learning to improve their language skills.

Benefits of testing • Fairness: through a test we give students an equal opportunity

Benefits of testing • Fairness: through a test we give students an equal opportunity to show us what they have learned and what they can do with the language. • Provide students with the same instructions, same input under the same conditions. • They allow us to confirm our own assessments and help us make decisions with more confidence.

Benefits of testing • Tests provide standardisation by which to judge performance and progress.

Benefits of testing • Tests provide standardisation by which to judge performance and progress. They allow us to compare students’ progress with each other and against criteria. • A test can give the teacher valuable information about where the students are in their learning and can affect what the teacher will cover next. • They will help a teacher to decide if her teaching has been effective and help to highlight what needs to be reviewed.

Benefits of testing • They are also instruments of public policy. National examinations are

Benefits of testing • They are also instruments of public policy. National examinations are used to ensure that learners across the country are held to the same standards.

Why else is testing important? • Because of its backwash effect • What does

Why else is testing important? • Because of its backwash effect • What does this mean? It is the effect that testing has on teaching. For better or worse, tests and exams exert control over what goes on in classrooms. If a test is regarded as important preparation for it can dominate all teaching and learning activities. This is because very many language classes are geared more or less directly to the tests or examinations the learners will end up taking. Teachers must often 'teach to' a test.

Is the quality of tests important for teaching? • Yes. – If the test

Is the quality of tests important for teaching? • Yes. – If the test is a bad one (or the test content and techniques are at variance with the objectives of a course, the result may be negative backwash, where we can say that teaching suffers because of the test coming at the end of the course. – If the test is a good one and its content reflects teaching and learning activities and the objectives of the course, and its nature well understood by the teacher, the effect on the teaching may be very positive. There will be positive backwash.

Ways of classifying tests • According to the manner by which they are scored

Ways of classifying tests • According to the manner by which they are scored (objective vs subjective, criterion reference vs norm referenced) • By purpose (placement, aptitude, progress, achievement and proficiency tests) • By their focus (i. e. all language abilities or separate)

Objective vs subjective tests • Objective tests are scored by comparing student responses with

Objective vs subjective tests • Objective tests are scored by comparing student responses with an established set of correct responses on an answer key. Ideal for computer scanning, does not require particular training or knowledge of the examined area. Use of multiple choice questions, T/F, and matching • Subjective tests require scoring by opinion or personal judgment so the human element is very important. Examples include essay tests, comprehension questions, interviews.

Criterion referenced vs Norm referenced tests • Criterion referenced tests measure mastery of well

Criterion referenced vs Norm referenced tests • Criterion referenced tests measure mastery of well defined instructional objectives specific to a particular course. Their purpose is to measure how much learning has occurred. Student achievement is measured with respect to the degree of learning/mastery of pre-specified content. • Norm referenced tests measure global language abilities and student scores are interpreted relative to all other students who take the exam. Acceptable standards of achievement are developed after the test has been administered. A student’s achievement is therefore interpreted with reference to the achievement of other students or groups of students, rather than to an agreed criterion.

Classifying tests by purpose • Placement tests assess students’ level of language ability so

Classifying tests by purpose • Placement tests assess students’ level of language ability so that they can be placed in an appropriate course or class. The primary aim is to create groups of learners that are homogeneous in level • Diagnostic tests identify language areas in which the student needs further help. The information gained is crucial for providing students with remedial activities. Placement tests frequently serve a dual function of placement and diagnosis

Classifying tests by purpose • Achievement tests. Are used to determine whether or not

Classifying tests by purpose • Achievement tests. Are used to determine whether or not students have mastered the course content or course objectives. The achievement test is similar to the progress test in that it measures how much the student has learned in the course of second language instruction. The content of achievement tests, which are commonly given at the middle or end of the course, is generally based on the course syllabus or the course textbook. • Progress tests are used at various stages throughout a language course to determine learners’ progress up to that point and to see what they have learnt. Progress tests are usually teacher produced and narrower in focus than achievement tests because they cover less material and assess fewer objectives.

Classifying tests by purpose • Proficiency tests are used to measure learners’ general linguistic

Classifying tests by purpose • Proficiency tests are used to measure learners’ general linguistic knowledge, abilities or skills without reference to any specific course. E. g. TOEFL, IELTS – Some proficiency tests are intended to show whether students or people outside the formal educational system have reached a given level of general language ability. – Others are designed to show whether candidates have sufficient ability to be able to use a language in some specific area such as medicine, tourism etc. Such tests are often called Specific Purposes tests.

Classifying tests by focus – Integrative tests, which include activities that assess skills and

Classifying tests by focus – Integrative tests, which include activities that assess skills and knowledge in an integrated manner (e. g. , reading and writing, listening and speaking). Less attention is paid to specific lexicogrammatical points. – Discrete point tests, which contain items that ideally reveal the candidate's ability to handle one level of language and one element of receptive or productive skills.

Communicative tests • Tests identified as ‘communicative’ are those which are interactionbased, open-ended (that

Communicative tests • Tests identified as ‘communicative’ are those which are interactionbased, open-ended (that is, responses cannot be predicted as in natural communicative environments), authentic, behavior-based and so on. • Communicative tests are supposed to measure communicative competence which includes: – linguistic competence – sociolinguistic competence – strategic competence

Test of English as a Foreign Language • One million test takers per year

Test of English as a Foreign Language • One million test takers per year • 100% multiple choice • -it uses “generic, or neutral” language and does not specify a context Objective Discrete-point Proficiency

Test of English for International Communication TOEFL equivalent for workplace setting two sections, 200

Test of English for International Communication TOEFL equivalent for workplace setting two sections, 200 q. � listening � reading entertainment, manufacturing, health, travel, finance, etc. “objective and cost-efficient”

International English Language Testing System Academic/General Results reported in band scores 1 -9 •

International English Language Testing System Academic/General Results reported in band scores 1 -9 • Listening, Reading, Writing, Speaking

The KPG exam system • KPG (Kratiko Pistopiitiko Glossomathias) is a state certification system

The KPG exam system • KPG (Kratiko Pistopiitiko Glossomathias) is a state certification system of language proficiency implemented in Greece on the basis of a 1999 law. The law was put into effect in 2002 and the first exams were run in April 2003. • The Ministry of National Education and Religious Affairs is the legal copyright owner of all documents containing information about the KPG assessment system and is responsible for the organization and administration of the exams. • The system has been designed and developed by groups of foreign language assessment specialists at the Universities of Athens and Thessaloniki appointed by the Ministry.

The KPG exam system • Candidates can take exams in English, German, Spanish and

The KPG exam system • Candidates can take exams in English, German, Spanish and Turkish (developed by groups of experts at the University of Athens) and French and Italian (developed by groups of experts at the University of Thessaloniki). • The KPG exams have so far been designed for levels A 1/A 2, B 1/B 2 and C 1. The integrated graded C level exam (C 1/C 2) will be offered in November 2013 • Exams are administered twice a year (November and May) for all levels and languages at exam centres throughout Greece which also serve as exam centres for the national university entrance exams.

Aims and defining characteristics • The KPG exams represent a unified examination system in

Aims and defining characteristics • The KPG exams represent a unified examination system in European languages viewed as equal in social value. Examinations in all languages currently offered are designed on the basis of common test specifications. • The KPG exams represent a “proficiency assessment” (rather than diagnostic or competences measurement) examination system which aims to test candidate’s ability to make socially purposeful use of the TL in Greece and abroad. • The KPG exams aim to measure candidates’ competence in comprehending and producing oral and written discourse as well as their ability to act as mediators across languages and their awareness of how the target language works to create socially purposeful meanings.

Aims and defining characteristics • The KPG exams adhere to a functional approach to

Aims and defining characteristics • The KPG exams adhere to a functional approach to language use. • Its global scale descriptors and language use descriptors relate to those of the CEFR. • As the system is being developed, detailed descriptions of all relevant procedures and strategies are published, making the system transparent on both national and international levels. • The KPG has recently been credited by the Greek state and it is recognized as a work qualification. • It does not have commercial interests and it is subsidized by the state.

Structure of the KPG exams Each exam regardless of level and language consists of

Structure of the KPG exams Each exam regardless of level and language consists of 4 modules: • Module 1 tests Reading comprehension and Language Awareness • Module 2 tests Free writing production and mediation skills • Module 3 tests Listening comprehension • Module 4 tests Free speaking production and mediation skills.

Qualities of a good test • • • Directly related to educational objectives Realistic&

Qualities of a good test • • • Directly related to educational objectives Realistic& practical Concerned with important & useful matters Comprehensive but brief Precise& clear

Essential features of tests • Validity is commonly defined as 'the extent to which

Essential features of tests • Validity is commonly defined as 'the extent to which [a test] measures what it is supposed to measure and nothing else. If a test is valid, the outsider who looks at an individual's score knows that it is a true reflection of the individual's skill in the area the test claims to have covered.

Essential features of tests • Reliability refers to the consistency of a test. That

Essential features of tests • Reliability refers to the consistency of a test. That is, if every time the test is administered it will have the same outcome. But reliability does not have to do with the content of the test alone; it has to do with marking in two ways: – ensuring that different raters give comparable marks to the same script – the same raters give the same marks on two different occasions to the same script

Essential features of tests USABILITY (practicality) ease in administration, scoring, interpretation and application, low

Essential features of tests USABILITY (practicality) ease in administration, scoring, interpretation and application, low cost, proper mechanical make – up

Let’s have a problem situation: A fisherman who captures on piece of yellow fin

Let’s have a problem situation: A fisherman who captures on piece of yellow fin tuna weighs it and it measures 100 kilograms. As he meets a friend after friend, he tells that the weight of the fish he caught is 130 kilo grams. In statistical sense, the story is reliable for it is consistent but the truthfulness of the fisherman’s story is not established, hence it is not valid but reliable. LESSON: A test can be reliable without being valid but a valid test is reliable.

Evaluating test usefulness Usefulness= reliability+validity+ impact authenticity+interactiveness+practicality (Bachman and Palmer, 1996) • authenticity= how

Evaluating test usefulness Usefulness= reliability+validity+ impact authenticity+interactiveness+practicality (Bachman and Palmer, 1996) • authenticity= how closely does the test resemble the actual language use situation • interactiveness= to what extent is the test taker involved in active communication • impact= what is the effect of the test on test takers, test users, teachers etc.

Steps in designing a test • - Outline learning objectives or major concepts to

Steps in designing a test • - Outline learning objectives or major concepts to be covered by the test. The test should be representative of objectives and material covered – Decide on the type and purpose of test – Decide on what abilities must be tested – Write specifications for the test – Write the items – Trial the items with a group similar to those for whom the test is designed – Analyse results and make necessary changes – Develop assessment scales

What do we mean by “test specifications”? • A specification is a detailed description

What do we mean by “test specifications”? • A specification is a detailed description of exactly what is being assessed and how it is being done. Test specifications should include ü General description of the assessment ü A list of skills to be tested and operations students should be able to do ü Techniques for assessing those skills ( tasks to be used, types of prompts for each task, expected type of response and timing of task) ü Expected level of performance and grading criteria

The most frequently used test formats 1. Questions & answers 2. True or false

The most frequently used test formats 1. Questions & answers 2. True or false questions 3. Multiple-choice questions 4. Gap-filling or completion 5. Matching questions 6. Dictation 7. Transformation 8. Translation 9. Essay writing 10. Interview

Multiple choice items – Most commonly used format. – Used to assess learning at

Multiple choice items – Most commonly used format. – Used to assess learning at recall and comprehension levels – They are reliable because one answer is possible – Assessment is not affected by the test taker’s writing abilities – Can cover a wide range of content can be sampled by one test – They are cost effective because they can be marked by computer

Disadvantages of MC items – Difficult to construct plausible alternative responses – Do not

Disadvantages of MC items – Difficult to construct plausible alternative responses – Do not lend themselves to the testing of productive skills – They encourage guessing – Time consuming to develop plausible distractors

What is a multiple choice item? • It consists of a stem (usually written

What is a multiple choice item? • It consists of a stem (usually written as a question or an incomplete statement) and response options (usually 3 -4). One of these options is the key (correct response) and the others are the distractors

Writing MC items 1. Writing the stem first: A. Be sure the stem asks

Writing MC items 1. Writing the stem first: A. Be sure the stem asks a clear question B. Stems phrased as questions are usually easier to write C. Stems should not contain a lot of irrelevant info. D. Be sure the stem is grammatically correct E. Avoid negatively stated stems

Writing distractors • Alternative responses/distractors should be plausible and as homogeneous as possible •

Writing distractors • Alternative responses/distractors should be plausible and as homogeneous as possible • Response alternatives should not overlap – Avoid double negatives • Emphasize negative wording • Each item should be independent of other items in the test – Information in the stem of one item should NOT help answer another item.

True/False format • They are a specialised form of the MC format in which

True/False format • They are a specialised form of the MC format in which there are only two possible alternatives (true/false, correct/incorrect, right/wrong, fact/opinion) • They are typically written as statements • Can test large amounts of content • Require less time for students to respond • Scoring is quick and reliable

But… • There is a 50% guessing factor • For them to be reliable

But… • There is a 50% guessing factor • For them to be reliable you need to include a sufficient number of them on a test

Tips for constructing True-False Items ü ü üAvoid double negatives üAvoid long or complex

Tips for constructing True-False Items ü ü üAvoid double negatives üAvoid long or complex sentences üSpecific determiners (always, never, only, etc. ) should be used with caution üInclude only one central idea in each statement üAvoid emphasizing the trivial üExact quantitative (two, three, four) language is better than qualitative (some, few, many) Questions should be written at a lower level of difficulty than the text Questions should appear in the same order as the answers appear in the text

Short-Answer Items • Two Types: (Question and Incomplete Statement) • Advantages: – Easy to

Short-Answer Items • Two Types: (Question and Incomplete Statement) • Advantages: – Easy to construct – Excellent format for measuring who, what, when, and where info. – Guessing in minimized – Student must know the material- rather than simply recognize the answer • Disadvantages: – Grading can be time consuming – More than one answer can be correct – Response it more time consuming which reduces possible number of items

An example • Teacup dogs are the latest must have Hollywood fashion accessory. They

An example • Teacup dogs are the latest must have Hollywood fashion accessory. They get their name because these tiny dogs fit rather nicely into a teacup. The latest Hollywood trend weighs less than three pounds. You only need to flip through the pages of celebrity magazines to see these adorable dogs peeking out of designer handbags on the streets of New York and Hollywood. • POOR T/F • Tea cup dogs weigh less than three pounds • GOOD T/F • Teacup dogs are much smaller than normal dogs

Matching format • Draws on students’ ability to make connections among ideas, vocabulary, and

Matching format • Draws on students’ ability to make connections among ideas, vocabulary, and structure. • Two columns of information are used; items in the left hand column are called premises or stems while those in the right hand are called options. • The advantage of matching items over MCs is that there are more distractors per item. The matching format is easier to design. • Both options and premises have an equal number of choices so if the learner misses one, he/she automatically will miss another one.

Cloze items • Cloze testing originated in the 1950’s as a test of reading

Cloze items • Cloze testing originated in the 1950’s as a test of reading comprehension. Cloze tests involve the removal of words at regular intervals (usually every 6 to 8 words but not less than every 5). Students must complete the gaps with the appropriate fillers by referring to the text on either side of the gap, taking into account meaning and structure to find the answer.

Gap-fill items • In gap fill items a word or phrase is replaced by

Gap-fill items • In gap fill items a word or phrase is replaced by a blank in a sentence. There are 2 types of gap fills: function gaps (prepositions, articles, conjunctions) and semantic gaps (nouns, adjectives, adverbs)

Essay type questions Types of Essay Questions • Extended Response Question – Great deal

Essay type questions Types of Essay Questions • Extended Response Question – Great deal of latitude on how to respond to a question. – Example: Discuss essay and multiple-choice type tests. • Restricted Response Question – More specific, easier to score, improved reliability and validity – Example: Compare and contrast the relative advantages of disadvantages of essay and multiple choice tests with respect to: reliability, validity, objectivity, & usability.

Essay type questions Advantages: • Measures higher learning levels (synthesis, evaluation) and is easier

Essay type questions Advantages: • Measures higher learning levels (synthesis, evaluation) and is easier to construct than an objective test item • Students are less likely to answer an essay question by guessing • Require superior study methods • Offer students an opportunity to demonstrate their abilities to: Organize knowledge Express opinions Foster creativity

Drawbacks of essay type questions • May limit the sampling of material covered •

Drawbacks of essay type questions • May limit the sampling of material covered • Tends to reduce validity of the test • Disadvantages – Subjective unreliable nature of scoring • “halo effect” – good or bad student’s previous level of performance • Written expression • Handwriting legibility • Grammatical and spelling errors • Time Consuming

Pointers for designing a test • • • Don’t test what you haven’t taught

Pointers for designing a test • • • Don’t test what you haven’t taught Don’t test general knowledge. Don’t introduce new techniques in tests. Don’t just test accuracy. Don’t forget to test the test

That’s all folks! Thanks for attending!

That’s all folks! Thanks for attending!

Use of templates You are free to use these templates for your personal and

Use of templates You are free to use these templates for your personal and business presentations. We have put a lot of work into developing all these templates and retain the copyright in them. They are not Open Source templates. You can use them freely providing that you do not redistribute or sell them. Do Don’t ü Use these templates for your û Resell or distribute these templates presentations û Put these templates on a website for ü Display your presentation on a web download. This includes uploading site provided that it is not for them onto file sharing networks like purpose of downloading the template. Slideshare, Myspace, Facebook, bit ü If you like these templates, we would torrent etc always appreciate a link back to our û Pass off any of our created content as website. Many thanks. your own work You can find many more free templates on the Presentation Magazine website www. presentationmagazine. com