Lost in Specialised Translation an Inexpensive and Under
- Slides: 114
Lost in Specialised Translation: an Inexpensive and Under. Exploited Aid for Language Service Providers Miriam Seghiri, Ph. D. University of Málaga (Spain) seghiri@uma. es
Index 1. Introduction 2. Corpora in Translation Training 3. Guidelines for Corpus Creation 3. 1. Design Criteria 3. 2. Compilation Protocol 4. Using Corpora to Translate 5. Analyzing the corpus with Concordance 6. Practice
Introduction
The inclusion of documentation as a core subject in the curriculum of Translation and Interpretation degrees clearly underlines its importance to translators. Training in this discipline is considered essential for a translator given that only sufficient and conscientious work on documentation will allow an adequate translation of a specialised text.
The sources of information that may be utilised by the translator are extremely varied, ranging from an oral consultation with an expert to a search using specialised glossaries and dictionaries. However, in the field of translation perhaps the most relevant documentation activity today involves the use of the Internet and, closely related to this, the compilation and management of virtual corpora.
Here, we shall present a systematic methodology for corpus compilation based on electronic resources available on the Internet. The methodology will be illustrated through the example of the creation of a virtual corpus of travel insurance in English.
Corpora in Translation Training
What is a corpus? corpus, pl. corpora, from the Latin word corpus, i. e. “body” A collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis (Francis 1982)
Some more definitions. . . A collection of naturally-occurring language text, chosen to characterize a state or variety of a language (Sinclair, 1991) [A Corpus is] A collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language (Sinclair, 1996)
A large collection of authentic text that have been gathered in electronic form according to a specific set of criteria (Bowker & Pearson, 2002) A closed set of texts in machinereadable form established for general or specific purposes by previously defined criteria (Engwall, 1992)
Characteristics of corpora • • • collections of text naturally-occurring / authentic text representative of a given language collected according to specific criteria stored in machine-readable format used for linguistic analysis
Different types of corpora According to what could corpora be distinguished/classified? • • language size purpose …
Classification of corpora I • medium: printed vs. electronic text (virtual) transcribed speech, audio files, video, multimodal • design method: balanced, opportunistic, open, closed, complete, partial • language variables: monolingual vs. multilingual comparable (original) vs. parallel (translations)
Classification of corpora II • language states: synchronic vs. diachronic (e. g. Brown vs. Helsinki Diachronic corpus) • not documented vs. documented (author, Internet address, . . . ) • plain vs. annotated (tags)
Uses/users of corpora • linguistics validate linguistic theories • lexicography • • • dictionary creation translation studies terminology language NLP
The advantages of using corpora in translation have been shown by various studies (cf. Laviosa, 1998; Bowker, 2002; Bowker y Pearson, 2002; Zanettin et al. 2003). Advantages: their objectivity, their reusability and multiple usage. They are user-friendly and allow access to and management of huge quantities of information in almost no time.
Virtual corpora is one of the translator’s most important aids when faced with a specialised text (cf. Pearson, 1998; Corpas Pastor, 2001 and 2004; Zanettin, 2002). A virtual corpus is a corpus compiled exclusively from electronic sources in order to carry out a specific translation in any direction (direct or indirect). Its principal objective is to construct a reliable resource quickly and at minimal cost, based on texts mined from the Internet, to satisfy the translator’s documentation needs.
Virtual corpora or… Ad hoc (Corpas, 2002) disposable (Zanettin, 2001) do-it-yourself (Maia, 1997; Zanettin, 2001) electronic (Corpas, 2001; Varantola, 2003) precision (Varantola, 1997) special purpose (Pearson, 1998). web (Fletchet, 2004)
Translators turn to the Internet in search of solutions to information and documentation problems because they are not only translating between languages but also between discourse communities and cultures. The compilation of corpora and the Internet appear to be two of the most important documentation resources in the practice and research of specialised translation. Corpora for a particular speciality are not available for consultation on the Internet. Translators have no alternative other than to compile their own virtual corpora for the specific translation that has been commissioned in each case.
Corpora vs. other types of text collections corpora have to be distinguished from other text collections: • text archives (repository of electronic texts) • collection of examples are NOT considered corpora • the world wide web?
Web as corpus? Pros • authentic text • large collection • readily available Cons • not sampled according to specific criteria • ‘dirty’ in terms of formats and language used • Population represented? • Quality of the documents?
In order for a collection of texts to be considered a corpus in the strict sense of the term, it must meet: a set of clear design criteria and a specific compilation protocol so that the collection may be deemed representative of the field of specialisation or the particular type of document that is being translated.
Guidelines for Corpus Creation
EN 15038: 2006 is the first European standard to set out the requirements for the provision of quality services by translation service providers (TSPs). The purpose of this European standard is to establish and define the requirements for the provision of quality services by translation service providers.
EN 15038
Professional Competences [EN 15038] - Translating - Linguistic and textual - Research, information acquisition & processing - Cultural - Technical The knowledge of how to compile and use corpora is an essential part of modern translational competence. (Varantola, 2003)
1) Design Criteria
Travel insurance policy written in German Translation into English (British English)
The objective is to create a corpus of travel insurance policies in English compiled exclusively from resources available on the Internet. Restricted to legislation in force and insurance policies that have been drawn up in the United Kingdom. It will include original (comparable corpus), complete texts and documented.
CORPUS DESING… Text type: policy Language: English Diatopic restrictions: United Kingdom Original or translations: Comparable (original) Complete text or partial: complete Documented: yes
2) Compilation Protocol
The Compilation Protocol is integrated by 4 steps: I. III. IV. Locating and accessing resources Downloading Data Text formating Data storage
Step I: Locating and accessing resources
Source: Global REACH http: //www. gobalreach. com/globstats
Evaluating web Resources - Checklist Ilustr. 1: Plantilla de evaluación.
E-RESOURCES 1. TELEMATIC SEARCH General search engines (Google, Yahoo, Altavista, etc. . ) Metasearch engines (Vivisimo) Specialised search engines (Find. Law: Internet Legal resources ) Directories Distribution lists Thematic lists Portals
Google: search engine
2. Institutional Search 2. 1. Directories 2. 2. Web sites
Institutional search: ABI. . Association of British Insurers
3. Personal Resources 3. 1. Expert Guides 3. 2. Who is who?
Expert Guide. : Red. IRIS.
4. Normative resources 4. 1. Organization for Standardization
ISO. International Organization for Standarization
5. Legal resources 5. 1. Web sites 5. 2. Databases
6. Linguistic resources 6. 1. 1. Dictionaries 6. 1. 2. Glossaries 6. 1. 3. Vocabularios 6. 1. 4. Databases 6. 1. 5. Thesaurus 6. 1. 6. Corpora
Step II: Downloading Data
The main sources of information to compile our corpus have been: institutional searches, carried out on the web sites of international organisations and institutions (WTO, EUR-Lex, ABI, ABTA, etc. ) key word searches using a search engine (www. google. com, www. yahoo. co. uk, etc. )
Institutional search: ABI, Association on British Insurers
key word searches
Step III: Text formatting
III. Text formatting Noticeable predilection for HTML (. html) and PDF (. pdf) exists. This stage of downloading is completed by what might be called normalisation, since all the documents will be converted to an ASCII or plain text format. In other words, they are stripped of the HTML or code of any other kind, in accordance with the cleantext policy described by Sinclair (1991: 21).
How to convert from PDF to TXT? http: //www. pdf-to-html-word. com/pdf-to-text
Step IV: Data Storage
IV. Data storage
In the study now under examination we have compiled a monolingual (English), documented, comparable virtual corpus which consists of 150 documents (3, 202, 118 words).
Using the Corpus to Translate
1) Concordancers
CONCORDANCERS Concordance (KWIC: Key Word In Context): list of all occurrences of a key word with contexts Concordancer (concordancing systems or Corpus Query System CQS): software tool that looks up key words and display KWIC lines i. iii. iv. v. Monolingual/multilingual Commercial/freeware Windows-/Mac-/Cross-Platform Simple/Modular Set up/Web
Conc. App Concordancing Programs The Conc. App Concordancing Programs is a noncommercial, monolingual concordancer suite for Windows operating systems (98, ME, NT / 2000, XP), that can be freely downloaded as execute or full set up program. It provides basic functionalities (word frequency lists; list of collocates; concordance searchers for words, phrases and derivatives) to process most European languages (English, Spanish, Italian, etc. ), as well as Chinese, Japanese, Thai and Russian in Unicode. http: //www. edict. com. hk/pub/concapp/
Concapp [Monolingual Non-Commercial Suite for Windows]
Another monolingual concordancer for Windows only is the Multilingual Corpus Toolkit which supports many European and Asian languages. http: //personalpages. manchester. ac. uk/staff/s cott. piao/research/Down. Load/download. htm Freeware concordancers for Mac only are Conc 1. 7/1. 8 and Concorder X 1. 0.
MLCT: Multilingual Corpus Toolkit
Ant. Conc 3. 2. is a non-commercial freely downloadable concordancer for Windows, Mac and Linux. This versatile software features several tools, which display lists of words and keywords (Word List, Keyword List), list, sort and search for lexical bundles (Collocates), generate lines in KWIC format (Concordance), indicate the position of the keyword within a given corpus (Concordance Plot), allow the user to have access to the whole source file or corpus (File View).
Ant. Conc [Monolingual Freeware Multiplatform Concordancer]
Commercial monolingual concordances usually offer trial periods or demos that just restrict the number of hits or prevent results to be saved or printed. All in all, they are not superior to non-commercial software. In fact, both feature roughly the same suite of tools and support most European languages. For instance, Ant. Conc and Oxford Word. Smith Tools 3. 1. and 4. 0. are very similar. Word. Smith. Tools and Concordance are the concondancers preferred by Translators.
Word. Smith Tools (WS) includes several utilities: Word. List, which displays lists of words and clusters in alphabetical or frequency order, and calculates sentence and word length; (b) Concord, which generates KWIC lines which can be sorted by n-left or n-right, centre or tags, as well as patterns, clusters and collocates which can be re-sorted and displayed in dispersion plots; and (c) Key. Words, which extracts key words and keywords as regards a given reference corpus. (a) http: //www. lexically. net/wordsmith/version 5/index. html
In addition, WS can be run on parallel corpora as well, as it contains Viewer and Aligner -a basic utility for producing an aligned version of two or more texts, with alternate sentences or paragraphs from each of them. Another relevant difference is the utility Web. Getter which enables the user to compile an a corpus from the Internet, by selecting a search engine which locates the first 100 sources and sends a robot to download each page provided it meets the user’s requirements, as defined in settings. This feature turns WS into a modular type of concordancing software, as it allows both for corpus exploitation and compilation/management
Word. Smith Tools [Mono-/Bilingual Commercial Concordancer]
Concordance is a commercial monolingual concordancer for Windows only. It works with nearly all languages supported by Windows, and includes lemmatisation, user-definable alphabet, reference system, contexts and flexible selecting, search and sorting of words, phrases, regular expressions, etc. ; it saves and export concordances as. txt and. html files or as web concordance.
Concordance
Another well-known commercial software for Windows only is Mono. Conc Pro (MP 2. 2. ), which supports all European languages plus Chinese, Japanese and Korean. Its distinctive features include Context Search, Regular Expression Search, Part-of-Speech Tag Search, Collocations and Corpus Comparison. http: //www. athel. com/mono. html This program also has a multilingual version: Para Conc.
A bilingual or multilingual concordancer is a program for parallel corpora, i. e. corpora of source texts and their translations into other languages. As a rule, this kind of software requires input aligned at sentence level. Most bi-/multilingual concordances are commercial. A well-known example is Para. Conc 0. 9, the multilingual version of Mono. Conc Pro. It can analyse up to four languages in parallel (one source text corpus and up to three target corpora).
Para. Conc [Bilingual Commercial Suite for Windows. Alignment]
Para. Conc [Concordancing]
Finally, there are web concordancing systems for specific corpus query: BNC Simple Search, Cobuild Direct Concordance and Collocation Sampler, LDC Online, Online Concordancer, Online KWIC Concordancer, RAE, …
CREA [Web Concordancer for Specific Corpus Query]
Other sample freeware monolingual systems exploit the Internet as a gigantic corpus, such as Web. Corp, KWICfinder or TAPo. Rware 2. 0. http: //www. webcorp. org. uk http: //www. kwicfinder. com http: //taporware. mcmaster. ca
Web. Corp [Internet Concordancer]
Analyzing the corpus compiled with Concordance http: //www. concordancesoftware. co. uk
Comparable corpora are particularly useful for meeting translators’ information needs. Representative Corpora: finding information on terminology, phraseology, concepts and text discourse for direct and inverse translation.
Recent research in translation studies has stressed the contribution which corpora of electronic texts can bring to translators … If a corpus is appropriately designed, it can provide reliable evidence of authentic linguistic behaviour and text-structuring conventions by highlighting recurrent patterns. Terminological and collocational information can be especially useful. (Zanettin, 2002)
EXERCICES
You have to carry out the translation of the following text written in German into English (British English) Ökophysiologie trophisher Beziehungen phytophager Insekten By Gerhard Schäller
Exercise 1 Look for Internet resources in order to carry out the Translation into English (British English)
Remember…. First of all, you have to design your corpus according to your needs
Remember… CORPUS DESING Text type: ? Language: ? Diatopic restrictions: ? Original or translations: ? Complete text or partial: ? Documented: ?
Remember…. The Compilation Protocol is integrated by 4 steps: I. III. IV. Locating and accessing resources Downloading Data Text formating Data storage
Remember… E-RESOURCES 1. TELEMATIC SEARCH 1. 1. General search engines (Google, Yahoo, Altavista, etc. . ) 1. 2. Metasearch engines (Vivisimo) 1. 3. Specialised search engines 1. 4. Directories 1. 5. Distribution lists 1. 6. Thematic lists 1. 7. Portals 2. Institutional Search 2. 1. Directories 2. 2. Web sites 3. Personal Resources 3. 1. Expert Guides 3. 2. Who is who? 4. Normative resources 4. 1. Organization for Standardization 5. Legal resources 5. 1. Web sites 5. 2. Databases 6. Linguistic resources 6. 1. Dictionaries 6. 2. Glossaries 6. 3. Vocabularios 6. 4. Databases 6. 5. Thesaurus 6. 6. Corpora
Remember… key word searches
Evaluating web Resources - Checklist Ilustr. 1: Plantilla de evaluación.
Exercise 1 Look for Internet resources in order to carry out the Translation into English
Exercise 2 Select , download and storage the documents in order to compile your corpus
Remember…. The Compilation Protocol is integrated by 4 steps: I. II. IV. Locating and accessing resources Downloading Data Text formating Data storage
Remember…. How to convert from PDF to TXT? http: //www. pdf-to-html-word. com/pdf-to-text
Remember …. IV. Data storage
Exercise 2 Select , download and storage the documents in order to compile your corpus
Exercise 3 Translate the original text into English using Concordance in order to manage your corpus. Concordance can be downloaded at: http: //www. concordancesoftware. co. uk
English Translation
Exercise 4 Compare the two main programs used by Translators: Concordance: http: //www. concordancesoftware. co. uk Word. Smith Tools 5. 0: http: //www. lexically. net/wordsmith/
Exercise 4 Compare the two main programs used by Translators: Concordance: http: //www. concordancesoftware. co. uk Word. Smith Tools 5. 0: http: //www. lexically. net/wordsmith/
Exercise 5 Let’s use Para. Conc. We are going to compile a bilingual (German and English) parallel (originals and translations) corpus on Medicine (Dementia) abstracts. http: //www. athel. com/para. html
Exercise 5 Let’s use Para. Conc. We are going to compile a bilingual (German and English) parallel (originals and translations) corpus on Medicine (Dementia) abstracts. http: //www. athel. com/para. html
Corolary This demonstration have shown the clear benefits of using such corpora over any type of dictionary as they provide examples of how words or expressions are used and translated in context. All in all, we have seen how corpora: (a) provide instant access to real usage, (b) depict syntagmatic patterns and translation equivalents unavailable in existing lexicographic resources, and (c) facilitate guidance to style and text conventions in both SL and TL. The methodology here presented can be used for the translation of any kind of text type, language/s and direction.
Thanks! Lost in Specialised Translation: an Inexpensive and Under-Exploited Aid for Language Service Providers Miriam Seghiri, Ph. D. University of Málaga (Spain) seghiri@uma. es
- Luke 15:11-35
- Care and support specialised housing fund
- Specialized applications
- Gallahue's 4 phases of motor development
- Connective tissue bone
- Advantage of division of labour
- Cytocrine glands
- Specialised cells
- Specialized information system
- Nfcc fire safety in specialised housing
- Connective tissue bone
- Exps insulation
- Redundant arrays of inexpensive disks
- Polymer definition forensics
- The spencer optics company produces an inexpensive
- Poetry is what gets lost in translation
- Communicative theory of translation
- Linear function transformations
- Voice translation profile
- Semantic translation và communicative translation
- Open english
- Frankenstein paradise lost
- حل اسئلة count that day lost
- Seek and save the lost
- Rhetorical devices in macbeth
- Lost and found poster
- Satan sin and death
- Get up and bar the door repeated lines
- Count that day lost poem
- Count that day lost summary
- Lost comparative and superlative
- In the poem, what does satan vow?
- Lost and found luke
- Notice writing format for class 7
- What is notice writing
- Jeremy le van
- Paradise lost and found
- Lost and found procedure in front office
- The great gatsby and the lost generation
- Edabs
- Group the words and word combinations
- Procedures in receiving and storing tools and materials
- Price and output determination under oligopoly
- Animal cell under microscope
- Product variety and quality under monopoly
- Price and output determination under monopoly
- Capital budgeting under uncertainty
- Product variety and quality under monopoly
- Time and value of supply under gst
- Words with prefix over and under
- And god said let the waters under the heaven
- Blood bank regulation under drugs and cosmetics act
- You take 100 you had kept under your mattress and deposit
- Conservative policies under reagan and bush
- The lost children of rockdale county
- Lost mountain middle school
- Lost at sea ranking chart
- Wax pattern
- If past simple
- Who lost territory after ww1
- Why is the lost generation called that
- Who accused wordsworth of being a lost leader
- The lost generation years
- Lost generation ww1
- Lost generation facts
- The lost atlantis
- The inn of lost time summary
- The fall of satan paradise lost
- Lost 65 pounds
- Sniffer for detecting lost mobiles
- Verb 3 lost
- Your buildings rita joe
- Theme of paradise lost-book 1 slideshare
- Paradise lost theme
- Context of reception
- Paradise lost synopsis
- Lost horizon meaning
- Lost horizon summary
- Riddle hotel room for 30 dollars
- The infernal serpent he it was whose guile
- Four man rubber life craft
- A 10 liter can of oil/petrol mixture
- Twenty square feet of opaque plastic
- Perfectly elastic collision
- Context of paradise lost
- Inlay wax manipulation
- Language
- Searanking
- Lost motion mechanism
- Count that day lost by george eliot
- Perfectly inelastic collision
- What countries lost territory after ww1
- A bit lost worksheet
- Woman who lost a coin
- Private krotoshinsky
- Surprised by sin the reader in paradise lost
- Samuel paradise hotel
- Why we must save dying tongues answers
- Gerund appositive examples
- Albrecht durer paradise lost
- Of man's first disobedience and the fruit explanation
- Echocardiogrphy
- Middle english examples
- Nadean cool
- Clause vs sentence
- Joc a word
- Land lost in time
- How the kiwi lost its wings
- 5 examples of literal and figurative language
- Antique adjective order
- The way you see your body is your
- Lost is adjective
- Lost foam guss
- The lost diary of snow white comprehension answers
- Waxing patterns