Collocational properties of translated language Silvia Bernardini School
Collocational properties of translated language Silvia Bernardini School for Translators and Interpreters University of Bologna at Forlì 30 July 07 silvia. bernardini@unibo. it
Overview l Collocations l l Brief overview Frequency vs. Phraseology A note on statistics Translation studies l l Theory Methodology Collocation Limits l Current study l Aim Method Results Discussion Limits l Ways forward l l
What is a collocation? l “[] I would like to put forward the concept of collocation which I have introduced in my own work. This is the study of key-words, pivotal words, leading words, by presenting them in the company they usually keep – that is to say, an element of their meaning is indicated when their habitual word accompaniments are shown. ” (Firth 1956: 106 -107) l E. g. : English: the English people, English literature, English reserve, English manners, English countryside, the English and all that can be said about them, the English public schools, English Universities (!) Collocations
Frequency-oriented views l “Significant” collocation is regular collocation between items, such that they occur more often than their respective frequencies and the length of the text in which they occur would predict (Jones and Sinclair 1974: 19) l A collocation is a sequence of words that occurs more than once in identical form and is grammatically well-structured (Kjellmer 1987: 133) Collocations
Phraseology-oriented views l Restricted collocations are fully institutionalised phrases, memorized as wholes and used as conventional formmeaning pairings (Howarth 1996: 37) Collocations
Frequency l l l vs. Phraseology Sum of many occurrences in texts Position important Number of words involved important Syntactic relationship can be important Frequency/statistics important l l An abstract entity with instantiations in texts (PERFORM + TASK) Position/number of constituents not central; Different restrictions distinguished (DOG+BARK not a collocation) Main criterion: semantic unpredictability Collocations
2 ways of finding collocations l Starting from a (set of) keyword(s) and looking left and right l l Gledhill (2000): phraseology surrounding “keywords” in different sections of cancer research articles Selecting all sequences of N words that recur a certain number of times l Kjellmer (1994): All two-word sequences appearing more than two times in the Brown corpus Collocations
A note on statistics l l Frequency (Danielsson 2001) Statistics: pointwise Mutual Information (MI) l l Compares the probability of observing x and y together (the joint probability) with the probabilities of observing x and y independently (chance). (Church and Hanks 1990: 77) Formula MI(x; y)= l p(xy) * N log 2 ------p(x) * p(y) Limits of MI Collocations
Corpus-based TS l l Theoretical background Methodological background Studies of collocation within TS Limits
Theoretical background 1 Baker (1993: 243) The most important task that awaits the application of corpus techniques in translation studies […] is the elucidation of the nature of translated text as a mediated communicative event. Corpus-based Translation Studies
Theoretical background 2 Toury (1995) Translation as norm-governed behaviour: ‘translatorship’ amounts first and foremost to being able to play a social role, i. e. to fulfil a function allotted by a community […] in a way which is deemed appropriate in its own terms of reference (ibid. : 53) Corpus-based Translation Studies
Operationalising it l l Studies should be carried out focusing on the nature of translational norms as compared to those governing non-translational kinds of text production (Toury 1995: 61). Corpus research in TS should focus on the identification of universal features of translation, that is features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems. (Baker 1993: 243). Corpus-based Translation Studies
“Universals” l l l l l Explicitation/explicitness Simplification Disambiguation Levelling out (homogeneity) Preference for conventional grammar Avoidance of repetition Exaggeration of features of the target language Normalisation/sanitisation Absence of TL-specific “unique items” “Shining-through” Corpus-based Translation Studies
Methodological background l Monolingual comparable corpora l l l Originals in Language A and comparable translations into Language A They should make visible “patterning which is specific to translated texts, irrespective of the source or target languages involved” (Baker 1995: 234). Parallel corpora l Originals in Language A and their translations into Language B, usually combined with reference corpora Corpus-based Translation Studies
TS: research on collocation Olohan (2004): Collocation and moderation l l Quite, rather, pretty and fairly in translated vs. original English fiction Pretty and rather, and more marginally quite, “are used a lot less in [TEC-Fiction] but, when they are, there is usually more variation in usage than in [BNC-fiction] and less repetition of common collocates” Corpus-based Translation Studies
TS: research on collocation Øverås (1998): Collocation and explicitation l First 50 sentences of 40 novel extracts (English + Norwegian) l Additions enriching the text with a common target language collocation ST: Det var en blanding av vill dristighet og en frøkenaktig, fornem finhet i hans slekt. (a mixture) TT: There was a strange mixture of wild boldness and dignified gentility in the family. l A collocational clash in the ST is rendered with a conventional TL combination ST: the cook's fat son would play plump tunes on his accordion. TT: kokkens fete sønn spille trivelige melodier på trekkspillet sitt. (pleasant tunes) Corpus-based Translation Studies
TS: research on collocation Kenny (2001): Collocation and sanitisation l l l Three-way comparison: a parallel corpus (English/German) and reference corpora of SL/TL Treatment of lexical creativity in translation Starting points: collocation hapaxes and clusters that are repeated in the work of a single author but not attested in any other texts Augen ~ trinken ich trinke mit gierigen Augen (literally: I drink with greedy eyes) translated as: “my avid eyes drank in…” Corpus-based Translation Studies
TS: research on collocation Baroni and Bernardini (2003): Collocation in MCC l l Monolingual comparable corpus of Italian original and translated articles from a single geopolitics journal. All bigrams from the translated sub-corpus and from the original sub-corpus Ranked according to their log-likelihood ratio value “Translated language is repetitive, possibly more repetitive than original language. Yet the two differ in what they tend to repeat: translations show a tendency to repeat structural patterns and strongly topic-dependent sequences, whereas originals show a higher incidence of topic-independent sequences, i. e. the more usual lexicalised collocations in the language” Corpus-based Translation Studies
TS: research on collocation Danielsson (2001): Collocation: monolingual & translational l Units of meaning in two large corpora of English and Swedish Words occurring ≥ 200 times Collocates (≥ 5) plugs sockets (6 occurrences) headphone sockets (7 occurrences) sunken sockets (6 occurrences) bulging their sockets (5 occurrences) l Data-sparseness problem: only 2 units of meaning (of the 12, 099 previously identified) occur five times or more in the ST component of the parallel fiction corpus (Swedish into English, ~400, 000 words per component) Corpus-based Translation Studies
Limits l General limits of MCC l l l Variables Tools and methods: too crude? Excessive downplay of the source text Over-generalisation of translation universals Specific difficulty of collocational studies l Data-sparseness in relatively small corpora Corpus-based Translation Studies
Collocations: a new approach l l l Aim and method Results (monolingual and parallel) Discussion, limits, ways forward
Research questions l Are translated texts more/less collocational than original texts in the same language l l i. e. , are their collocation types overall more/less frequently attested and significant? If so, is this a consequence of the translation process? l i. e. , can we identify shifts that could account for the observed overall differences? Aim and method
Intuition l The point is not to look for collocations that repeat themselves frequently within small and hardly comparable “translation-driven” corpora, but to identify those collocations that are frequent and/or significant in the language as a whole. Aim and method
2 sets of corpus resources Study corpora l l Small monolingual comparable corpora of fiction texts (English => Italian; Italian => English) Reference corpora l l The British National Corpus l l The Repubblica Corpus l l (100 million words from a variety of sources) (340 million words from a single Italian newspaper) The English and Italian Web via Google/Yahoo automatic API queries Aim and method
Study corpora (fiction) 1. 2. 3. 4. 5. 6. 7. 8. M. Atwood/C. Penati Il racconto dell’ancella M. Atwood/M. Papi Occhio di gatto M. Cruz Smith/P. F. Paolini Gorky Park C. Fowler/S. Bini Nozze di sangue N. Gordimer/F. Cavagnoli Storia di mio figlio G. Greene/B. Oddera Il decimo uomo D. Leavitt/A. Cossiga Un luogo dove non sono mai stato R. Rendell/H. Brinis Oltre il cancello 1. 2. 3. 4. 5. 6. 7. Aim and method F. Camon La malattia chiamata uomo G. Celati I narratori delle pianure C. Comencini Le pagine strappate L. Blissett Q D. Maraini Donna in guerra G. Pontiggia Il giocatore invisibile G. Tomasi di Lampedusa Il Gattopardo
Corpus preparation Scanning in Tokenisation Tagging (part-of-speech) Lemmatisation l l l l treetagger Metadata annotation Alignment (easyalign) Indexing (Corpus. Work. Bench, CWB) Aim and method
Extraction of candidates 1 l Target sequences l l Lexical collocations Made of two words Contiguous Pos-based extraction from study corpora l Based on literature, e. g. l l JN, NN, V * N, N * * N Collection of frequencies from reference corpora Aim and method
Extraction of candidates 2 l Calculate MI l l l Rank sequences Take top l l UCS (Evert 2004 -2006) Arbitrary cut-off point: MI>2 and fq 2 Calculate significance of difference btw original and translated l Mann-Whitney significance tests Aim and method
Results (MCC, Mann-Whitney) l l l l J N lit eng (MI; higher in original, p=. 08) N V lit ita (MI; p=. 008) N V lit eng (FQ; p=. 05) V N lit ita (MI; p=. 01) J * J lit ita (MI; p=. 06) N prep/conj N lit ita (MI; p=. 007) N * N lit eng (FQ; p=. 06) N * * N lit ita (FQ; p=. 07) Results
Results for N prep/conj N (lit ita) MI Results original translated min 2. 001 2. 000 q 1 2. 381 2. 425 median 2. 736 2. 853 q 3 3. 392 3. 590 max 5. 757 6. 059 mean 2. 954 3. 069
Results (MCC, quantitative) Translated 855 Original 691 Total number of types 3853 3971 Tokens (randomly-sampled) 4222 Types with MI>2 and fq 2 Results
Results (parallel, summary) Shift type Occurrences Creative collocational 7 Collocational collocational ( meaning) 7 Free collocational 11 More explicit 86 More formal/precise 16 Marginal cases (additions, changes) Total shifts observed 9 136 Total concordance lines analysed 1, 061 Shifts leading to increased “collocativeness” Results
Creative => collocational (7) TT: Ricordo l’odore della terra smossa, il <senso di pienezza> che davano le forme tonde dei bulbi chiusi nella mano LIT: I remember the smell of the turned earth, the sense of fullness that gave the round shapes of the bulbs held in the hand ST: I can remember the smell of the turned earth, the plump shapes of bulbs held in the hands, fullness The handmaid’s tale TT: Il <rumore dei tacchi> risuonò sulle piastrelle del corridoio. LIT: the noise of the heels resounded on the tiles of the corridor ST: Her heels clicked on the hall tiles. Red bride Results
Different meaning (7) TT: Fa collezione di <cartine di sigarette> con disegni di aeroplani, e ne conosce tutti i nomi. ST: He collects cigarette cards with pictures of airplanes on them, and knows the names of all the planes. Cat’s eye Cigarette cards Occurrences Meaning BNC: 16 Google: 491, 000 Collectible cards found in cigarette packs Cartine Repubblica: 3 da/per/di/del Google: 726 le sigarette Rolling papers, i. e. small sheets of paper which are sold for rolling one's own cigarettes Figurine Repubblica: 0 da/per/di/del Google: 2 le sigarette (by analogy with other products) collectible cards found in cigarette packs
Free => collocational (11) TT: decorazioni di <spicchi d' aglio>, si rende conto che ST: handpainted by Alex with purple garlic bulbs, she sees that A place I’ve never been garlic 34, 300, 000 (100%) 2, 580, 000 (100%) aglio garlic bulbs + bulbs of garlic 109, 600 (0. 31%) 1305 (0. 05%) bulbi d’aglio + bulbi di aglio + garlic heads + heads of garlic 59, 300 (0. 17%) 612 (0. 02%) teste d’aglio + teste di aglio garlic cloves + cloves of garlic 2, 207, 000 (6. 43%) 229, 100 (8. 87%) spicchi d’aglio + spicchi di aglio Web data Results
Explicitation (86) - general TT: All'apertura nel basso <muro di cinta> l'autista esitò, poi accelerò LIT: At the opening in the low perimeter wall the driver hesitated, then accelerated ST: He hesitated at the gap in the low wall, then accelerated and went ahead A place I’ve never been TT: schiacciato sotto il <tacco della scarpa>, seppellito LIT: ground away under the heel of the shoe, buried ST: ground away under my heel, buried My son’s story Results
Explicitation (86) - partitives TT: Non riuscivo a prendere sonno, così sono sceso a bere un <sorso d'acqua> LIT: I couldn’t sleep, so I came down to drink a gulp of water ST: I couldn't sleep, so I came down for water The tenth man TT: i <raggi del sole> filtrano dalla lunetta sulla porta LIT: the rays of the sun filter through the fanlight ST: Sun comes through the fanlight The handmaid’s tale Results
Explicitation (86) - head nouns TT: manifesti di Bon Jovi e dei Guns' n Roses attaccati con le <puntine da disegno> sul grande mare della parete ST: Bon Jovi and Guns' n Roses posters thumbtacked into the great sea wall A place I’ve never been TT: Osserviamo il <cerume delle orecchie>, il muco del naso e lo sporco tra le dita dei piedi ST: We look at ear-wax, or snot, or dirt from our toes Cat’s eye Results
More formal/more exact (16) TT: Spostando col piede i <capi di vestiario> sul pavimento, non trovò traccia della prova incriminante. LIT: items of clothing ST: Kicking around among the clothes on the floor, he found no trace of the incriminating article. Red bride TT: Si stava frugando tra le <pieghe dell'abito>, per prendere il lasciapassare LIT: folds of the robe ST: She was fumbling in her robe, for her pass The handmaid’s tale Results
Other cases (9) Adverbs TT: Dal <punto di vista> domestico, si adattarono l' uno all' altra ST: Domestically they adjusted to one another My son’s story l Domestication TT: Il cadavere era stato fatto a fettine da una lama larga e pesante, non trovata sul <luogo delitto> ST: The corpse had been slashed to ribbons by a large, heavy blade, not found on the premises. Red bride l Gratuitous changes TT: del greco c'era anche qualche tavolino con sudici <vasetti di fiori> artificiali e bottiglie di ketchup ST: the Greek had a few tables set out with flyspotted artificial flowers and tomato sauce bottles My son’s story l Results
Discussion - MCC l Are Italian translated texts more/less collocational than originals? l l Translated texts would seem to be more collocational than originals A single exception: JN into Eng l Translated less collocational than original, why? § § Probable shining-through Over-representation of collocations with common words Discussion, limits, ways forward
JN in Eng: shining-through? Delicate fingers TT: I put some soft golden apricots as big as eggs on his plate, and watch him split them open, hardly moving his long, <delicate fingers>. ST: Gli ho messo nel piatto delle albicocche grandi come uova, morbide, dorate, e l'ho osservato mentre le spaccava, muovendo appena le dita lunghe e delicate. Donna in guerra Collocation fq 1 fq 2 fq 1 -2 delicate fingers 1646 5346 5 gentle fingers 2477 5346 12 slender fingers 701 5346 15 nimble fingers 101 5346 15 Discussion, limits, ways forward MI LL 2. 7545 53. 4624 2. 9572 139. 5338 3. 6023 219. 2139 4. 4437 279. 3528
JN in Eng: common words TRANSLATED ENGLISH ORIGINAL ENGLISH few days few evenings few feet few followers few friends few hours few jokes few kilos few minutes few months few paces few pages few passengers few days few feet few hours few inches few miles few minutes few moments few months few seconds few steps few tables few phrases few rays few scraps few seconds few sentences few spots few steps few stones few survivors few weeks few words few yards few years Overall frequency of few: translated 133, original 39
Discussion - parallel l Is higher collocativeness a consequence of the translation process? l l Probably… NB: shifts towards higher collocativeness would appear to be l partly independent l l free=> collocational, different meaning (normalisation) partly related to other strategies/procedures l explicitation, shining-through Discussion, limits, ways forward
Limits l Just how certain are we that these shifts are the cause of the observed differences? l l Shifts are no doubt observable also in nonsignificant rankings… (To what extent) could single author or translator preferences account for these differences? l The corpora are very small… Discussion, limits, ways forward
Further work l Bottom-up search for regularities l l Source-oriented approach l l BNC, WWW, ukwac / Repubblica, WWW, itwac Collocation extraction l l Starting from ST collocations Role of reference corpora l l Other genres? Evaluation of method: no hands! Creative exploitation of collocations l Can it be automatised? Discussion, limits, ways forward
Thank you silvia. bernardini@unibo. it
References Baker, M. 1993. “Corpus linguistics and translation studies” In Baker et al. (eds) Text and Technology. Benjamins. Baker, M. 1995. “Corpora in translation studies: An overview and some suggestions for future research”. Target 7, 2. Baroni, M. and S. Bernardini. 2003. “A preliminary analysis of collocational differences in monolingual comparable corpora”. In Archer et al. (eds), Proceedings of CL 2003. UCREL. Danielsson P. 2001. The Automatic identification of meaningful units in language. Ph. D Thesis. Göteborg University. Evert, S. 2004 -2006. The UCS Toolkit [http: //www. collocations. de/software. html] Firth, J. R. 1956 (1968). “Descriptive linguistics and the study of English”. in Palmer (ed) Selected papers of J. R. Firth 1952 -1959. Longman. Gledhill, C. 2000. Collocations in science writing. Gunter Narr. Howarth, P. 1996. Phraseology in English academic writing. Max Niemeyer. Kenny, D. 2001. Lexis and creativity in translation. St. Jerome. Kjellmer, G. 1987. “Aspects of English collocations”. In Meijs (ed) Corpus Linguistics and Beyond. Rodopi. Kjellmer, G. 1994. A Dictionary of English collocations. Clarendon Press. Olohan, M. 2004. Introducing corpora in translation studies. Routledge. Øverås, L. 1998. “In search of the third code: An investigation of norms in literary translation”. Meta 43, 4. Sinclair, J. Mc. H. 1991. Corpus, concordance, collocation. Oxford University Press. Sinclair, J. Mc. H. and S. Jones 1974. “English lexical collocations”. Cahiers de Lexicologie 24. Toury, G. 1995. Descriptive translation studies and beyond. Benjamins.
Pattern W p value MI/LOG FQ Higher in 2 JN ita w= 122618 p= 0. 002261 MI Original 2 JN eng w= 165680. 5 p= 0. 05237 MI Original 2 NJ ita w= 78109. 5 p= 0. 001134 MI Original 2 NN eng w= 19142. 5 p= 0. 005172 (LOG)FQ Translated 2 RJ eng w=7609 p= 0. 06921 MI Original 2 RV eng w= 10458 p= 0. 04767 MI Original 2 VR eng w= 2907 p= 0. 01517 (LOG)FQ Original 3 NN ita w= 21683 w= 22066. 5 p= 0. 02607 p= 0. 01029 MI (LOG)FQ Original 3 VN eng w= 11904 p= 0. 05694 MI Original 3 NN eng w= 1910. 5 p= 0. 0429 (LOG)FQ Original 4 VN eng w= 1027 p= 0. 06974 (LOG)FQ Original Results of OSS significance testing
POS patterns originally searched Rankings selected for in the corpora significance testing for 2 -gram 3 -grams 4 -grams 2 -gram 34 grams s JJ JN JV NN NV RJ JJ NN NV VN VV RN NN VN VV JJ JN NN NV JJ NN VN 2 -gram 3 -grams 4 -grams 2 -gram 34 grams s JJ JN NJ NN NV JJ JN NN NV VN VV JN NJ NV VN JJ NN VN RV VJ VN VV VR RR VJ VN VV RJ RV VN VR NN VN English Italian
- Slides: 51