Lost in Specialised Translation an Inexpensive and Under

  • Slides: 114
Download presentation
Lost in Specialised Translation: an Inexpensive and Under. Exploited Aid for Language Service Providers

Lost in Specialised Translation: an Inexpensive and Under. Exploited Aid for Language Service Providers Miriam Seghiri, Ph. D. University of Málaga (Spain) seghiri@uma. es

Index 1. Introduction 2. Corpora in Translation Training 3. Guidelines for Corpus Creation 3.

Index 1. Introduction 2. Corpora in Translation Training 3. Guidelines for Corpus Creation 3. 1. Design Criteria 3. 2. Compilation Protocol 4. Using Corpora to Translate 5. Analyzing the corpus with Concordance 6. Practice

Introduction

Introduction

 The inclusion of documentation as a core subject in the curriculum of Translation

The inclusion of documentation as a core subject in the curriculum of Translation and Interpretation degrees clearly underlines its importance to translators. Training in this discipline is considered essential for a translator given that only sufficient and conscientious work on documentation will allow an adequate translation of a specialised text.

 The sources of information that may be utilised by the translator are extremely

The sources of information that may be utilised by the translator are extremely varied, ranging from an oral consultation with an expert to a search using specialised glossaries and dictionaries. However, in the field of translation perhaps the most relevant documentation activity today involves the use of the Internet and, closely related to this, the compilation and management of virtual corpora.

 Here, we shall present a systematic methodology for corpus compilation based on electronic

Here, we shall present a systematic methodology for corpus compilation based on electronic resources available on the Internet. The methodology will be illustrated through the example of the creation of a virtual corpus of travel insurance in English.

Corpora in Translation Training

Corpora in Translation Training

What is a corpus? corpus, pl. corpora, from the Latin word corpus, i. e.

What is a corpus? corpus, pl. corpora, from the Latin word corpus, i. e. “body” A collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis (Francis 1982)

Some more definitions. . . A collection of naturally-occurring language text, chosen to characterize

Some more definitions. . . A collection of naturally-occurring language text, chosen to characterize a state or variety of a language (Sinclair, 1991) [A Corpus is] A collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language (Sinclair, 1996)

 A large collection of authentic text that have been gathered in electronic form

A large collection of authentic text that have been gathered in electronic form according to a specific set of criteria (Bowker & Pearson, 2002) A closed set of texts in machinereadable form established for general or specific purposes by previously defined criteria (Engwall, 1992)

Characteristics of corpora • • • collections of text naturally-occurring / authentic text representative

Characteristics of corpora • • • collections of text naturally-occurring / authentic text representative of a given language collected according to specific criteria stored in machine-readable format used for linguistic analysis

Different types of corpora According to what could corpora be distinguished/classified? • • language

Different types of corpora According to what could corpora be distinguished/classified? • • language size purpose …

Classification of corpora I • medium: printed vs. electronic text (virtual) transcribed speech, audio

Classification of corpora I • medium: printed vs. electronic text (virtual) transcribed speech, audio files, video, multimodal • design method: balanced, opportunistic, open, closed, complete, partial • language variables: monolingual vs. multilingual comparable (original) vs. parallel (translations)

Classification of corpora II • language states: synchronic vs. diachronic (e. g. Brown vs.

Classification of corpora II • language states: synchronic vs. diachronic (e. g. Brown vs. Helsinki Diachronic corpus) • not documented vs. documented (author, Internet address, . . . ) • plain vs. annotated (tags)

Uses/users of corpora • linguistics validate linguistic theories • lexicography • • • dictionary

Uses/users of corpora • linguistics validate linguistic theories • lexicography • • • dictionary creation translation studies terminology language NLP

 The advantages of using corpora in translation have been shown by various studies

The advantages of using corpora in translation have been shown by various studies (cf. Laviosa, 1998; Bowker, 2002; Bowker y Pearson, 2002; Zanettin et al. 2003). Advantages: their objectivity, their reusability and multiple usage. They are user-friendly and allow access to and management of huge quantities of information in almost no time.

 Virtual corpora is one of the translator’s most important aids when faced with

Virtual corpora is one of the translator’s most important aids when faced with a specialised text (cf. Pearson, 1998; Corpas Pastor, 2001 and 2004; Zanettin, 2002). A virtual corpus is a corpus compiled exclusively from electronic sources in order to carry out a specific translation in any direction (direct or indirect). Its principal objective is to construct a reliable resource quickly and at minimal cost, based on texts mined from the Internet, to satisfy the translator’s documentation needs.

Virtual corpora or… Ad hoc (Corpas, 2002) disposable (Zanettin, 2001) do-it-yourself (Maia, 1997; Zanettin,

Virtual corpora or… Ad hoc (Corpas, 2002) disposable (Zanettin, 2001) do-it-yourself (Maia, 1997; Zanettin, 2001) electronic (Corpas, 2001; Varantola, 2003) precision (Varantola, 1997) special purpose (Pearson, 1998). web (Fletchet, 2004)

 Translators turn to the Internet in search of solutions to information and documentation

Translators turn to the Internet in search of solutions to information and documentation problems because they are not only translating between languages but also between discourse communities and cultures. The compilation of corpora and the Internet appear to be two of the most important documentation resources in the practice and research of specialised translation. Corpora for a particular speciality are not available for consultation on the Internet. Translators have no alternative other than to compile their own virtual corpora for the specific translation that has been commissioned in each case.

Corpora vs. other types of text collections corpora have to be distinguished from other

Corpora vs. other types of text collections corpora have to be distinguished from other text collections: • text archives (repository of electronic texts) • collection of examples are NOT considered corpora • the world wide web?

Web as corpus? Pros • authentic text • large collection • readily available Cons

Web as corpus? Pros • authentic text • large collection • readily available Cons • not sampled according to specific criteria • ‘dirty’ in terms of formats and language used • Population represented? • Quality of the documents?

In order for a collection of texts to be considered a corpus in the

In order for a collection of texts to be considered a corpus in the strict sense of the term, it must meet: a set of clear design criteria and a specific compilation protocol so that the collection may be deemed representative of the field of specialisation or the particular type of document that is being translated.

Guidelines for Corpus Creation

Guidelines for Corpus Creation

EN 15038: 2006 is the first European standard to set out the requirements for

EN 15038: 2006 is the first European standard to set out the requirements for the provision of quality services by translation service providers (TSPs). The purpose of this European standard is to establish and define the requirements for the provision of quality services by translation service providers.

EN 15038

EN 15038

Professional Competences [EN 15038] - Translating - Linguistic and textual - Research, information acquisition

Professional Competences [EN 15038] - Translating - Linguistic and textual - Research, information acquisition & processing - Cultural - Technical The knowledge of how to compile and use corpora is an essential part of modern translational competence. (Varantola, 2003)

1) Design Criteria

1) Design Criteria

Travel insurance policy written in German Translation into English (British English)

Travel insurance policy written in German Translation into English (British English)

 The objective is to create a corpus of travel insurance policies in English

The objective is to create a corpus of travel insurance policies in English compiled exclusively from resources available on the Internet. Restricted to legislation in force and insurance policies that have been drawn up in the United Kingdom. It will include original (comparable corpus), complete texts and documented.

CORPUS DESING… Text type: policy Language: English Diatopic restrictions: United Kingdom Original or translations:

CORPUS DESING… Text type: policy Language: English Diatopic restrictions: United Kingdom Original or translations: Comparable (original) Complete text or partial: complete Documented: yes

2) Compilation Protocol

2) Compilation Protocol

The Compilation Protocol is integrated by 4 steps: I. III. IV. Locating and accessing

The Compilation Protocol is integrated by 4 steps: I. III. IV. Locating and accessing resources Downloading Data Text formating Data storage

Step I: Locating and accessing resources

Step I: Locating and accessing resources

Source: Global REACH http: //www. gobalreach. com/globstats

Source: Global REACH http: //www. gobalreach. com/globstats

Evaluating web Resources - Checklist Ilustr. 1: Plantilla de evaluación.

Evaluating web Resources - Checklist Ilustr. 1: Plantilla de evaluación.

E-RESOURCES 1. TELEMATIC SEARCH General search engines (Google, Yahoo, Altavista, etc. . ) Metasearch

E-RESOURCES 1. TELEMATIC SEARCH General search engines (Google, Yahoo, Altavista, etc. . ) Metasearch engines (Vivisimo) Specialised search engines (Find. Law: Internet Legal resources ) Directories Distribution lists Thematic lists Portals

Google: search engine

Google: search engine

2. Institutional Search 2. 1. Directories 2. 2. Web sites

2. Institutional Search 2. 1. Directories 2. 2. Web sites

Institutional search: ABI. . Association of British Insurers

Institutional search: ABI. . Association of British Insurers

3. Personal Resources 3. 1. Expert Guides 3. 2. Who is who?

3. Personal Resources 3. 1. Expert Guides 3. 2. Who is who?

Expert Guide. : Red. IRIS.

Expert Guide. : Red. IRIS.

4. Normative resources 4. 1. Organization for Standardization

4. Normative resources 4. 1. Organization for Standardization

ISO. International Organization for Standarization

ISO. International Organization for Standarization

5. Legal resources 5. 1. Web sites 5. 2. Databases

5. Legal resources 5. 1. Web sites 5. 2. Databases

6. Linguistic resources 6. 1. 1. Dictionaries 6. 1. 2. Glossaries 6. 1. 3.

6. Linguistic resources 6. 1. 1. Dictionaries 6. 1. 2. Glossaries 6. 1. 3. Vocabularios 6. 1. 4. Databases 6. 1. 5. Thesaurus 6. 1. 6. Corpora

Step II: Downloading Data

Step II: Downloading Data

The main sources of information to compile our corpus have been: institutional searches, carried

The main sources of information to compile our corpus have been: institutional searches, carried out on the web sites of international organisations and institutions (WTO, EUR-Lex, ABI, ABTA, etc. ) key word searches using a search engine (www. google. com, www. yahoo. co. uk, etc. )

Institutional search: ABI, Association on British Insurers

Institutional search: ABI, Association on British Insurers

key word searches

key word searches

Step III: Text formatting

Step III: Text formatting

III. Text formatting Noticeable predilection for HTML (. html) and PDF (. pdf) exists.

III. Text formatting Noticeable predilection for HTML (. html) and PDF (. pdf) exists. This stage of downloading is completed by what might be called normalisation, since all the documents will be converted to an ASCII or plain text format. In other words, they are stripped of the HTML or code of any other kind, in accordance with the cleantext policy described by Sinclair (1991: 21).

 How to convert from PDF to TXT? http: //www. pdf-to-html-word. com/pdf-to-text

How to convert from PDF to TXT? http: //www. pdf-to-html-word. com/pdf-to-text

Step IV: Data Storage

Step IV: Data Storage

IV. Data storage

IV. Data storage

In the study now under examination we have compiled a monolingual (English), documented, comparable

In the study now under examination we have compiled a monolingual (English), documented, comparable virtual corpus which consists of 150 documents (3, 202, 118 words).

Using the Corpus to Translate

Using the Corpus to Translate

1) Concordancers

1) Concordancers

CONCORDANCERS Concordance (KWIC: Key Word In Context): list of all occurrences of a key

CONCORDANCERS Concordance (KWIC: Key Word In Context): list of all occurrences of a key word with contexts Concordancer (concordancing systems or Corpus Query System CQS): software tool that looks up key words and display KWIC lines i. iii. iv. v. Monolingual/multilingual Commercial/freeware Windows-/Mac-/Cross-Platform Simple/Modular Set up/Web

Conc. App Concordancing Programs The Conc. App Concordancing Programs is a noncommercial, monolingual concordancer

Conc. App Concordancing Programs The Conc. App Concordancing Programs is a noncommercial, monolingual concordancer suite for Windows operating systems (98, ME, NT / 2000, XP), that can be freely downloaded as execute or full set up program. It provides basic functionalities (word frequency lists; list of collocates; concordance searchers for words, phrases and derivatives) to process most European languages (English, Spanish, Italian, etc. ), as well as Chinese, Japanese, Thai and Russian in Unicode. http: //www. edict. com. hk/pub/concapp/

Concapp [Monolingual Non-Commercial Suite for Windows]

Concapp [Monolingual Non-Commercial Suite for Windows]

 Another monolingual concordancer for Windows only is the Multilingual Corpus Toolkit which supports

Another monolingual concordancer for Windows only is the Multilingual Corpus Toolkit which supports many European and Asian languages. http: //personalpages. manchester. ac. uk/staff/s cott. piao/research/Down. Load/download. htm Freeware concordancers for Mac only are Conc 1. 7/1. 8 and Concorder X 1. 0.

MLCT: Multilingual Corpus Toolkit

MLCT: Multilingual Corpus Toolkit

 Ant. Conc 3. 2. is a non-commercial freely downloadable concordancer for Windows, Mac

Ant. Conc 3. 2. is a non-commercial freely downloadable concordancer for Windows, Mac and Linux. This versatile software features several tools, which display lists of words and keywords (Word List, Keyword List), list, sort and search for lexical bundles (Collocates), generate lines in KWIC format (Concordance), indicate the position of the keyword within a given corpus (Concordance Plot), allow the user to have access to the whole source file or corpus (File View).

Ant. Conc [Monolingual Freeware Multiplatform Concordancer]

Ant. Conc [Monolingual Freeware Multiplatform Concordancer]

 Commercial monolingual concordances usually offer trial periods or demos that just restrict the

Commercial monolingual concordances usually offer trial periods or demos that just restrict the number of hits or prevent results to be saved or printed. All in all, they are not superior to non-commercial software. In fact, both feature roughly the same suite of tools and support most European languages. For instance, Ant. Conc and Oxford Word. Smith Tools 3. 1. and 4. 0. are very similar. Word. Smith. Tools and Concordance are the concondancers preferred by Translators.

 Word. Smith Tools (WS) includes several utilities: Word. List, which displays lists of

Word. Smith Tools (WS) includes several utilities: Word. List, which displays lists of words and clusters in alphabetical or frequency order, and calculates sentence and word length; (b) Concord, which generates KWIC lines which can be sorted by n-left or n-right, centre or tags, as well as patterns, clusters and collocates which can be re-sorted and displayed in dispersion plots; and (c) Key. Words, which extracts key words and keywords as regards a given reference corpus. (a) http: //www. lexically. net/wordsmith/version 5/index. html

In addition, WS can be run on parallel corpora as well, as it contains

In addition, WS can be run on parallel corpora as well, as it contains Viewer and Aligner -a basic utility for producing an aligned version of two or more texts, with alternate sentences or paragraphs from each of them. Another relevant difference is the utility Web. Getter which enables the user to compile an a corpus from the Internet, by selecting a search engine which locates the first 100 sources and sends a robot to download each page provided it meets the user’s requirements, as defined in settings. This feature turns WS into a modular type of concordancing software, as it allows both for corpus exploitation and compilation/management

Word. Smith Tools [Mono-/Bilingual Commercial Concordancer]

Word. Smith Tools [Mono-/Bilingual Commercial Concordancer]

 Concordance is a commercial monolingual concordancer for Windows only. It works with nearly

Concordance is a commercial monolingual concordancer for Windows only. It works with nearly all languages supported by Windows, and includes lemmatisation, user-definable alphabet, reference system, contexts and flexible selecting, search and sorting of words, phrases, regular expressions, etc. ; it saves and export concordances as. txt and. html files or as web concordance.

Concordance

Concordance

 Another well-known commercial software for Windows only is Mono. Conc Pro (MP 2.

Another well-known commercial software for Windows only is Mono. Conc Pro (MP 2. 2. ), which supports all European languages plus Chinese, Japanese and Korean. Its distinctive features include Context Search, Regular Expression Search, Part-of-Speech Tag Search, Collocations and Corpus Comparison. http: //www. athel. com/mono. html This program also has a multilingual version: Para Conc.

 A bilingual or multilingual concordancer is a program for parallel corpora, i. e.

A bilingual or multilingual concordancer is a program for parallel corpora, i. e. corpora of source texts and their translations into other languages. As a rule, this kind of software requires input aligned at sentence level. Most bi-/multilingual concordances are commercial. A well-known example is Para. Conc 0. 9, the multilingual version of Mono. Conc Pro. It can analyse up to four languages in parallel (one source text corpus and up to three target corpora).

Para. Conc [Bilingual Commercial Suite for Windows. Alignment]

Para. Conc [Bilingual Commercial Suite for Windows. Alignment]

Para. Conc [Concordancing]

Para. Conc [Concordancing]

 Finally, there are web concordancing systems for specific corpus query: BNC Simple Search,

Finally, there are web concordancing systems for specific corpus query: BNC Simple Search, Cobuild Direct Concordance and Collocation Sampler, LDC Online, Online Concordancer, Online KWIC Concordancer, RAE, …

CREA [Web Concordancer for Specific Corpus Query]

CREA [Web Concordancer for Specific Corpus Query]

 Other sample freeware monolingual systems exploit the Internet as a gigantic corpus, such

Other sample freeware monolingual systems exploit the Internet as a gigantic corpus, such as Web. Corp, KWICfinder or TAPo. Rware 2. 0. http: //www. webcorp. org. uk http: //www. kwicfinder. com http: //taporware. mcmaster. ca

Web. Corp [Internet Concordancer]

Web. Corp [Internet Concordancer]

Analyzing the corpus compiled with Concordance http: //www. concordancesoftware. co. uk

Analyzing the corpus compiled with Concordance http: //www. concordancesoftware. co. uk

 Comparable corpora are particularly useful for meeting translators’ information needs. Representative Corpora: finding

Comparable corpora are particularly useful for meeting translators’ information needs. Representative Corpora: finding information on terminology, phraseology, concepts and text discourse for direct and inverse translation.

 Recent research in translation studies has stressed the contribution which corpora of electronic

Recent research in translation studies has stressed the contribution which corpora of electronic texts can bring to translators … If a corpus is appropriately designed, it can provide reliable evidence of authentic linguistic behaviour and text-structuring conventions by highlighting recurrent patterns. Terminological and collocational information can be especially useful. (Zanettin, 2002)

EXERCICES

EXERCICES

 You have to carry out the translation of the following text written in

You have to carry out the translation of the following text written in German into English (British English) Ökophysiologie trophisher Beziehungen phytophager Insekten By Gerhard Schäller

Exercise 1 Look for Internet resources in order to carry out the Translation into

Exercise 1 Look for Internet resources in order to carry out the Translation into English (British English)

Remember…. First of all, you have to design your corpus according to your needs

Remember…. First of all, you have to design your corpus according to your needs

Remember… CORPUS DESING Text type: ? Language: ? Diatopic restrictions: ? Original or translations:

Remember… CORPUS DESING Text type: ? Language: ? Diatopic restrictions: ? Original or translations: ? Complete text or partial: ? Documented: ?

Remember…. The Compilation Protocol is integrated by 4 steps: I. III. IV. Locating and

Remember…. The Compilation Protocol is integrated by 4 steps: I. III. IV. Locating and accessing resources Downloading Data Text formating Data storage

Remember… E-RESOURCES 1. TELEMATIC SEARCH 1. 1. General search engines (Google, Yahoo, Altavista, etc.

Remember… E-RESOURCES 1. TELEMATIC SEARCH 1. 1. General search engines (Google, Yahoo, Altavista, etc. . ) 1. 2. Metasearch engines (Vivisimo) 1. 3. Specialised search engines 1. 4. Directories 1. 5. Distribution lists 1. 6. Thematic lists 1. 7. Portals 2. Institutional Search 2. 1. Directories 2. 2. Web sites 3. Personal Resources 3. 1. Expert Guides 3. 2. Who is who? 4. Normative resources 4. 1. Organization for Standardization 5. Legal resources 5. 1. Web sites 5. 2. Databases 6. Linguistic resources 6. 1. Dictionaries 6. 2. Glossaries 6. 3. Vocabularios 6. 4. Databases 6. 5. Thesaurus 6. 6. Corpora

Remember… key word searches

Remember… key word searches

Evaluating web Resources - Checklist Ilustr. 1: Plantilla de evaluación.

Evaluating web Resources - Checklist Ilustr. 1: Plantilla de evaluación.

Exercise 1 Look for Internet resources in order to carry out the Translation into

Exercise 1 Look for Internet resources in order to carry out the Translation into English

Exercise 2 Select , download and storage the documents in order to compile your

Exercise 2 Select , download and storage the documents in order to compile your corpus

Remember…. The Compilation Protocol is integrated by 4 steps: I. II. IV. Locating and

Remember…. The Compilation Protocol is integrated by 4 steps: I. II. IV. Locating and accessing resources Downloading Data Text formating Data storage

Remember…. How to convert from PDF to TXT? http: //www. pdf-to-html-word. com/pdf-to-text

Remember…. How to convert from PDF to TXT? http: //www. pdf-to-html-word. com/pdf-to-text

Remember …. IV. Data storage

Remember …. IV. Data storage

Exercise 2 Select , download and storage the documents in order to compile your

Exercise 2 Select , download and storage the documents in order to compile your corpus

Exercise 3 Translate the original text into English using Concordance in order to manage

Exercise 3 Translate the original text into English using Concordance in order to manage your corpus. Concordance can be downloaded at: http: //www. concordancesoftware. co. uk

English Translation

English Translation

Exercise 4 Compare the two main programs used by Translators: Concordance: http: //www. concordancesoftware.

Exercise 4 Compare the two main programs used by Translators: Concordance: http: //www. concordancesoftware. co. uk Word. Smith Tools 5. 0: http: //www. lexically. net/wordsmith/

Exercise 4 Compare the two main programs used by Translators: Concordance: http: //www. concordancesoftware.

Exercise 4 Compare the two main programs used by Translators: Concordance: http: //www. concordancesoftware. co. uk Word. Smith Tools 5. 0: http: //www. lexically. net/wordsmith/

Exercise 5 Let’s use Para. Conc. We are going to compile a bilingual (German

Exercise 5 Let’s use Para. Conc. We are going to compile a bilingual (German and English) parallel (originals and translations) corpus on Medicine (Dementia) abstracts. http: //www. athel. com/para. html

Exercise 5 Let’s use Para. Conc. We are going to compile a bilingual (German

Exercise 5 Let’s use Para. Conc. We are going to compile a bilingual (German and English) parallel (originals and translations) corpus on Medicine (Dementia) abstracts. http: //www. athel. com/para. html

Corolary This demonstration have shown the clear benefits of using such corpora over any

Corolary This demonstration have shown the clear benefits of using such corpora over any type of dictionary as they provide examples of how words or expressions are used and translated in context. All in all, we have seen how corpora: (a) provide instant access to real usage, (b) depict syntagmatic patterns and translation equivalents unavailable in existing lexicographic resources, and (c) facilitate guidance to style and text conventions in both SL and TL. The methodology here presented can be used for the translation of any kind of text type, language/s and direction.

Thanks! Lost in Specialised Translation: an Inexpensive and Under-Exploited Aid for Language Service Providers

Thanks! Lost in Specialised Translation: an Inexpensive and Under-Exploited Aid for Language Service Providers Miriam Seghiri, Ph. D. University of Málaga (Spain) seghiri@uma. es