Designing LargeScale Speaking and Writing Assessments TOEIC Speaking

  • Slides: 20
Download presentation
Designing Large-Scale Speaking and Writing Assessments TOEIC® Speaking as an Example Jakub Novák, Educational

Designing Large-Scale Speaking and Writing Assessments TOEIC® Speaking as an Example Jakub Novák, Educational Testing Service

TOEIC Speaking and Writing ® • Design process started in August 2005 • First

TOEIC Speaking and Writing ® • Design process started in August 2005 • First operational tests administered in December 2006 • Design followed principles of Evidence-Centered Design approach (ECD)

Evidence-Centered Design (ECD) • Framework: -- claims -- evidence -- tasks • Advantage: transparent

Evidence-Centered Design (ECD) • Framework: -- claims -- evidence -- tasks • Advantage: transparent and evidentially solid relation between tasks and claims.

Business Requirements for TOEIC® Speaking • Test should discriminate across a wide range of

Business Requirements for TOEIC® Speaking • Test should discriminate across a wide range of abilities, starting with the bottom quintile of traditional TOEIC® takers. • Test should separate candidates into ~ 10 levels. • Many unique forms of the test will be administered each year.

General claim • Test taker can communicate in spoken English to function effectively in

General claim • Test taker can communicate in spoken English to function effectively in a global workplace context.

Partial, hierarchical claims 1. Test-taker can create connected, sustained discourse appropriate to the typical

Partial, hierarchical claims 1. Test-taker can create connected, sustained discourse appropriate to the typical workplace. 2. Test-taker can carry out routine social and occupational interactions such as giving and receiving directions, asking for information, asking for clarification, and so forth. 3. Test-taker can produce some language that is intelligible to native and proficient non-native English speakers.

Test-taker can produce some language that is intelligible to native and proficient non-native English

Test-taker can produce some language that is intelligible to native and proficient non-native English speakers. • Task: Complete the sentence: “Whenever I have free time, …” This task type can give the evidence, but cannot yield enough unique prompts.

Test-taker can produce some language that is intelligible to native and proficient non-native English

Test-taker can produce some language that is intelligible to native and proficient non-native English speakers. Task: Read aloud the text on the screen. You will have 45 seconds to prepare. Then you will have 45 seconds to read the text aloud. Whether you want office supplies for personal or for business use, Sun Office Products is the single source for all your needs. With over 50 years of experience, our professionals can help you find any type of supply for any project… This task type can give the desired evidence, and can yield many prompts.

TOEIC® Speaking – Read a Text Aloud Evaluation Criteria • Pronunciation High Pronunciation is

TOEIC® Speaking – Read a Text Aloud Evaluation Criteria • Pronunciation High Pronunciation is highly intelligible, though the response may include minor lapses and/or other language influence. Medium Pronunciation is generally intelligible, though it includes some lapses and/or other language influence. Low Pronunciation may be intelligible at times, but significant other language influence interferes with appropriate delivery of the text.

Ability Levels (idealized case)

Ability Levels (idealized case)

From task to evidence to claim • Performance on a task can be reliably

From task to evidence to claim • Performance on a task can be reliably scored, giving evidence for a partial claim. • Partial claims can be combined into a general claim. • General claims for all levels are supported by evidence.

Test-taker can create connected, sustained discourse appropriate to the typical workplace. • Propose a

Test-taker can create connected, sustained discourse appropriate to the typical workplace. • Propose a Solution (show that you recognize the problem, and propose a way of dealing with the problem. ) Hi, this is Marsha Syms. Um, I’m calling about my bank card. I went to the bank machine early this morning, you know - the ATM (upspeak). . . because the bank was closed so only the machine was open. Anyway, I put my card in the machine and got my money out. . but then my card didn’t come out of the machine. I got my receipt and my money but then my bank card just didn’t come out. And I’m leaving for my vacation tonight so I’m really going to need it. . I had to get to work early this morning, and couldn’t wait around for the bank to open. . Could you call me here at work, and let me know how to get my bank card back? I’m really busy today, and really need you to call me soon. I can’t go on vacation without my bank card. This is Marsha Syms at 555 -1234. Thanks. (30 seconds to prepare, 60 seconds to speak. )

Test-taker can create connected, sustained discourse appropriate to the typical workplace. • Make a

Test-taker can create connected, sustained discourse appropriate to the typical workplace. • Make a Recommendation Imagine that your company is planning an international conference for all its clients. Your department is responsible for choosing the hotel for the conference. The chart below includes information about two different hotels. Please take 10 seconds to look at the chart. Prepare a voice-mail report for Mr. Collins, your supervisor, who has asked you to recommend one hotel for the conference. (45 seconds to prepare, 60 seconds to speak. )

Scoring a high-level task Level 5 Response is effective and consists of highly intelligible,

Scoring a high-level task Level 5 Response is effective and consists of highly intelligible, sustained, coherent discourse. Characterized by all of the following: – Response presents a clear progression of ideas and conveys the relevant information required by the tasks. It includes appropriate detail, though it may have minor omissions. – Speech is clear with generally well-paced flow and fluid expression. Response may include minor lapses or minor difficulties with pronunciation or intonation patterns which do not affect overall intelligibility. – Response exhibits a fairly high degree of automaticity with good control of basic and complex structures (as appropriate). Some minor errors may be noticeable but do not obscure meaning. – Use of vocabulary is accurate and precise.

Testing the Test: The Pilot Study • Four test forms created, administered to 2700

Testing the Test: The Pilot Study • Four test forms created, administered to 2700 subjects who represented the target range of abilities (Dec. 2005–Jan. 2006) • Responses scored through Online Scoring Network (OSN) by trained raters. The response to each task scored by a separater unfamiliar with candidate’s other responses. • Raw scores weighted: highest-level tasks received the highest weight.

Results of pilot study • Test writers can create multiple versions of the same

Results of pilot study • Test writers can create multiple versions of the same test • • • task of equivalent difficulty. Test takers who took more than one version of the test scored the same on both versions. Different raters rated the same response with the same score. Test takers who performed well on high-level tasks performed well on lower-level tasks as well. The assumption that tasks were hierarchical was confirmed. 8 proficiency levels (not 10) supported by data. “Make a recommendation” task does not provide good evidence.

Test-taker can create connected, sustained discourse appropriate to the typical workplace. • Make a

Test-taker can create connected, sustained discourse appropriate to the typical workplace. • Make a Recommendation Imagine that your company is planning an international conference for all its clients. Your department is responsible for choosing the hotel for the conference. The chart below includes information about two different hotels. Please take 10 seconds to look at the chart. Prepare a voice-mail report for Mr. Collins, your supervisor, who has asked you to recommend one hotel for the conference. (45 seconds to prepare, 60 seconds to speak. )

TOEIC Speaking Test Overview ®

TOEIC Speaking Test Overview ®

Score report information: claims for 8 levels Level 5 Scale Score 110 -120 Typically,

Score report information: claims for 8 levels Level 5 Scale Score 110 -120 Typically, test takers at level 5 have limited success at expressing an opinion or responding to a complicated request. Responses include problems such as: language that is inaccurate, vague, or repetitive; minimal or no awareness of audience; long pauses and frequent hesitations; limited expression of ideas and connections between ideas; limited vocabulary. Most of the time, test takers at level 5 can answer questions and give basic information. However, sometimes their responses are difficult to understand or interpret. When reading aloud, test takers at Level 5 are generally intelligible. However, when creating language, their pronunciation, intonation and stress may be inconsistent.

Inquiries about TOEIC® www. ets. org under TOEIC

Inquiries about TOEIC® www. ets. org under TOEIC