Concordances collocations and connotation Barnbrook G 1996 Language
Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3, 4, 5 Partington A (1998) Patterns and Meanings. Amsterdam: John Benjamins. Chapters 1, 2, 4
Lexical information in corpora • Start looking at the kind of information (about individual words) that can be got from corpora – – Simple frequency information Distribution information Collocation (co-occurrence information) Connotation (semantic prosody) • Introduce basic ideas • Future topics – Statistics – Case studies 2
Frequency information • Most banal information: counting how many times a word (“type”) appears in a text • Most frequent words will be function words, so often f counts exclude words listed in a “stop list” • Should you count words or lemmas? • Should you distinguish alternate meanings of ambiguous word forms (if you can)? 3
Frequency information • Frequency information on its own is not particularly interesting • Quite useful to compare f of related words – eg alternative readings of a given word form (already seen in probability calculations in tagging) – or comparing near synonyms, especially if we can take context into account (see later) • f of a given word in a given context can be indicative, eg pronouns more frequent as subject or 1 st word of sentence 4
Types and tokens • Remember distinction between “tokens” (words) and “types” (different words) • Type count gives a measure of how many DIFFERENT words are used • Type-token ratio gives a measure of “vocabulary richness” – If vocabulary is very varied, TTR will be higher • TTR is very sensitive to overall text length, so it is not meaningful to compare TTRs for texts of different lengths • Standardized TTR is the average of the TTR for each sequence of n words (typical default n=1000) in a text or corpus 5
Vocabulary growth curve • Plotting types against tokens for a given text shows us how the TTR grows as the text gets longer • Typically, the curve starts steeply and then flattens, sooner or later reflecting homogeneity (or otherwise) of the text VGC for Macbeth in Basic English source: http: //web. missouri. edu/~youmansc/vmp/help/Youmans-Type. Token. pdf 6
Vocabulary growth curve • Comparative VGC for four texts • Simple measure used in some literary studies (a) (b) (c) (d) (a) Longfellow (b) Hemingway (c) Basic English (Macbeth) (d) Bible (Genesis 2) 7
Vocabulary in context • “Concordance”, also known as KWIC list (key word in context) • Allows us to see the (immediate) environment in which a word appears • Listings can be customised to show what you want more clearly, eg – sorted according to next or previous word – showing more or less context 8
source: A Partington Patterns and Meanings. Amsterdam (1998): John Benjamins 9
CIWK search • inverted KWIC • specify the context and look to see what words occur in it 10
Collocation • Term coined by J R Firth (1957) to characterise (part of) his theory of meaning • “You shall judge a word by the company it keeps” • “The occurrence of two or more words within a short space of each other in a text” (Sinclair 1991) • “The relationship a lexical item has with items tha appear with greater than random probability in its (textual) context” (Hoey 1991; emphasis added) 11
Collocation, text type and style • Distinguish between general and more usual collocations vs technical and more personal ones • eg in a general corpus time collocates with save, spend, waste, fritter away, … • but in a corpus of sports reports time collocates with half, full, extra, injury, first, second, third, … 12
Collocation and idiom • Listing collocations will often reveal idioms and cliches • Important to think of collocation as extending beyond neighbouring words (which can be captured by simple concordances) 13
Collecting collocations • If we are to look beyond neighbouring words, what constraints might we impose? • Collocation means co-occurrence within some defined context – possibly a “window” of n words to left and/or right – if corpus is tagged/parsed, we can look at collocations within structures – or we can define the window in terms of constituents rather than words 14
Measuring significance • The significance of any co-occurrences nees to be established – Raw co-occurrence frequency counts mean nothing – Need to be compared to something else • Need to compare a given co-occurrence with random chance, or with some other co-occurrence • More detail next time 15
Collocation and synonymy • Collocation is good evidence in discussing (near) synonymy • Lots of studies take near synonyms and look to see if the nature of their relationship can be characterised by their distribution • In other words: what words does each of the synonym set collocate with? • Especially useful for language learners 16
Example of sheer and synonyms • (from Partington book) • three senses (LDOCE) – pure, ‘nothing but’, eg sheer luck – steep, sheer drop – thin, sheer stockings • (Cobuild) use sheer to emphasize completeness of state • 92 occurrences of sheer (in meaning 1) in his corpus 17
collocations of sheer • expression of magnitude of weight or volume to right (20%) – volume, weight, numbers, mass, scale, quantity, size – almost always with article the • expression of force, strength or energy (22%) – energy. exertion, force, muscle, strength, power, pressure, fury, pace, intensity – usually with the, or a preposition but no article • expression of persistence (14%) – pesistence, irreversibility, obstinacy, indomitability, insistence, reliability, integrity, hard work – left context: through, because of, out of, expressing causation, but not the 18
collocations of sheer • nouns expressing strong emotion (11%) – fun, joy, panic, inspiration, enjoyment, terror • nouns expressing extreme personal qualities (11%) – beauty, glamour, brutality, thuggery, madness, folly • nouns expressing extreme ability or lack of same (8%) – expertise, competence, virtuosity, gamesmanship 19
Synonyms of sheer - pure • LDOCE definitions, 5 meanings of which two overlap: – not mixed with anything – complete, thorough • Corpus has 135 examples • Larger variety of syntactic environments (sheer was always modifying a noun) including predicative, which sheer does not occur in – *? The drop was sheer – * His fury was sheer 20
Synonyms of sheer - pure • Religious-moral context; sense of unmixed – doctrine, faith, goodness; chemicals, gold • But, many examples where it has an emphasizing function, like sheer – accident, chance, comedy, guesswork, honesty, idiocy, malice, nostalgia, pleasure, selfishness, talent, theatre, vulnerability, whim, wickedness – often with proper nouns (unlike sheer) • No examples of pure collocating with items expressing magnitude, force or persistence • Some overlap with sheer – personal qualities, emotion (though generally less extreme ones) • Only few examples of pure in prepositional phrase expressing causation; causes can be sheer, but states are pure 21
Other synonyms of sheer • Partington does similar analysis of complete and absolute • Shows that each of the “synoynms” has more typical uses and patterns, though there is some overlap • But there is also clear evidence of complementary usage 22
Connotation and semantic prosody • Collocation can also be used to illustrate connotation – “secondary implications of a word” (Lyons 1977) • Three distinct uses of the term – marker of a particular speech variety (eg lovely) – cultural implications (words used to describe women show what society thinks of them) – marker of speakers evaluation (firm ~ stubborn) • “Semantic prosody” (Sinclair 1987) – use of a certain word spreads its connotation over the whole utterance 23
Some examples • object of commit is often something bad (foul, deception, offence) • if something is described as rife, it is not good (crime, disease, mistakes), and describing it as rife expresses a negative connotation (speculation is rife) • both the above exemplify “unfavourable prosody”, but other prosodies are possible • good example claim vs admit responsibility for an atrocity 24
More power to your elbow • Examples given in last few slides were largely subjective • More interesting if we can back up observations with calculations of statistical significance • Next time we will look at some simple statistical measures 25
- Slides: 25