Stylistics and Stylometry CSC 4598 Machine Translation Dr

  • Slides: 18
Download presentation
Stylistics and Stylometry CSC 4598 Machine Translation Dr. Tom Way

Stylistics and Stylometry CSC 4598 Machine Translation Dr. Tom Way

What is “style”? • Term not much loved by linguists – Too vague –

What is “style”? • Term not much loved by linguists – Too vague – Has connotations in similar fields (“style” = good style, a value judgment) • Many books/articles make reference to etymology of the word (Lat. stilus = ‘pen’), so it follows that style is mainly about written language • Various definitions, some very close to things already seen (especially “register”) • Two main aspects widely supposed: – style is choice – style is described by reference to something else 2/28

Style as choice • For any intended meaning there a range of alternative ways

Style as choice • For any intended meaning there a range of alternative ways of expressing that meaning • Different choices express nuances – of meaning – of other things (style? ) eg buy vs purchase • Example: – Visitors are respectfully informed that the coin required for the meter is a quarter; no other coin is acceptable – Quarters only – Propositional meaning is the same; difference in expression conveys something else 3/28

Style as choice (2) • Style is a choice, but often the “choice” is

Style as choice (2) • Style is a choice, but often the “choice” is somewhat predetermined • For example: a choice between appropriate and inappropriate style • So perhaps style does not connote “good” or “bad” but merely the way in which the author expresses or conveys things 4/28

Style and the norm • Some writers define style as – “individual characteristics of

Style and the norm • Some writers define style as – “individual characteristics of a text” – “total sum of deviations from a norm” • But what is the “norm”? – Is there some form of the language that is neutral as regards style? – Note also that the norm shifts: for example, many works are written in the vernacular of their time • Literary stylistics focuses on the exceptional 5/28

Style and the norm (2) • Even if there is no norm, we can

Style and the norm (2) • Even if there is no norm, we can describe style comparatively – Stylistics mainly involves comparing and contrasting texts – and associating linguistic variance with contextual explanation • Some authors see style as being what is added to the text 6/28

Stylistic analysis • Informally identify stylistic features felt to be significant • Devise a

Stylistic analysis • Informally identify stylistic features felt to be significant • Devise a method of analysis which facilitates comparison between usages • Identify the stylistic function of the features so identified 7/28

Types of features • “Invariable” features due to the individual or the time –

Types of features • “Invariable” features due to the individual or the time – usually of little interest • Discourse features – medium, what features distinguish written language from spoken language – participation: e. g. , monologue vs dialogue • Province (= field) lexis and syntax • Status (= tenor) features relating to relative social standing of writer/speaker and reader/listener • Modality (= text type) e. g. , message delivered as a letter, postcard, text message, email, etc. • Singularity: deliberate occasional idiosyncrasies 8/28

Method and function • Methods and features determine each other – you can only

Method and function • Methods and features determine each other – you can only measure features that you can extract – simple counting features are easy to extract – more complex features can be extracted thanks to NLP techniques of corpus annotation (tagging, parsing, etc) • Describing the function of observed differences – could be based on intuition – or using more advanced techniques (factor analysis) 9/28

What to count • Simple things may characterize different styles – average sentence length

What to count • Simple things may characterize different styles – average sentence length – average word length – type: token ratio (vocabulary richness) • number of types = number of different words • number of tokens = total number of words – vocabulary growth (homogeneity of text) • number of new types in 1 st, 2 nd, …, nth 1000 words • in rich varied text, number will climb steadily • Especially when used comparatively 10/28

What to count (2) • More complex analyses can give a more interesting picture

What to count (2) • More complex analyses can give a more interesting picture – specific syntactic structures – degree of modification in Noun Phrases (NPs) – types of verbs (e. g. , verbs of persuasion, speech verbs, action verbs, descriptive verbs) – distribution of pronouns (1 st/2 nd/3 rd person) – etc. (anything you can think of) • Quite sophisticated mathematical techniques can give an overall picture – e. g. , factor analysis: identifies from a (big) range of variables which ones best identify/characterize differences 11/28

Normalization and significance • Always important to compare like with like – It is

Normalization and significance • Always important to compare like with like – It is usual when counting things to “normalize” over the length of the text – If one text is longer than the other, of course you would expect higher frequencies of everything • Issue of statistical significance – Small differences may not really tell you anything – Various measures can confirm whether difference is statistically significant or due to random fluctuation 12/28

How to count • How to recognize paragraph breaks? • How to recognize sentence

How to count • How to recognize paragraph breaks? • How to recognize sentence breaks? – Headlines don’t end in a full stop – Not all sentences end in a full stop – Not all full stops are sentence ending (abbreviations) • How to count words – Hyphenated words, contractions e. g. don’t • How to measure word-length/complexity – length only roughly corresponds to complexity – number of characters vs. number of syllables – counting syllables implies either a dictionary or an algorithm 13/28

More sophisticated counting • Tagging and parsing allows you to look at grammatical and

More sophisticated counting • Tagging and parsing allows you to look at grammatical and lexical issues – Use of particular POSs (conjunctions, pronouns, auxiliaries, modals) – Use of particular features (tenses, …) – Use of particular constructions (passives, interrogatives) 14/28

Quantifying register differences • Much work based on corpora trying to quantify and characterize

Quantifying register differences • Much work based on corpora trying to quantify and characterize register differences • Work pioneered by Douglas Biber • Simple counts like the ones suggested • Also, more complex computations 15/28

Example anaphoric noun - refers back to previous object anaphoric pronoun - refers back

Example anaphoric noun - refers back to previous object anaphoric pronoun - refers back to previous object exophoric pronoun - refers to something outside text From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambriufge University Press, 1998. Ch 5: the study of discourse characteristics 16/28

Features (1) 17/28

Features (1) 17/28

Features (2) ~150 features in all 18/28

Features (2) ~150 features in all 18/28