Stylistics and stylometry What is style Term not

  • Slides: 28
Download presentation
Stylistics and stylometry

Stylistics and stylometry

What is “style”? • Term not much loved by linguists – Too vague –

What is “style”? • Term not much loved by linguists – Too vague – Has connotations in neighbouring fields (“style” = good style, ie a value judgment) • Many books/articles make reference to etymology of the word (Lat. stilus = ‘pen’), so it follows that style is mainly about written language • Various definitions, some very close to things already seen (especially “register”) • Two main aspects widely supposed: – style is choice – style is described by reference to something else 2

Style as choice • For any intended meaning there a range of alternative ways

Style as choice • For any intended meaning there a range of alternative ways of expressing that meaning • Different choices express nuances – of meaning – of other things (style? ) eg buy vs purchase • Example: – Visitors are respectfully informed that the coin required for the meter is 50 p; no other coin is acceptable – 50 p pieces only – Propositional meaning is the same; difference in expression conveys something else (register etc) 3

Style as choice • Style is a choice, but often the “choice” is somewhat

Style as choice • Style is a choice, but often the “choice” is somewhat predetermined • ie a choice between appropriate and inappropriate style • So maybe “style” is just another word for register? 4

Style and the norm • Some writers define style as – “individual characteristics of

Style and the norm • Some writers define style as – “individual characteristics of a text” – “total sum of deviations from a norm” • But what is the “norm”? – Is there some form of the language that is neutral as regards style/register? – Note also that the norm shifts: eg Bible AV was written in the vernacular of its time • Literary stylistics focuses on the exceptional 5

 • Even if there is no norm, we can describe style comparatively –

• Even if there is no norm, we can describe style comparatively – Stylistics mainly involves comparing and contrasting texts – and associating linguistic variance with contextual explanation • Some authors see style as being what is added to the text 6

Stylistic analysis • Gulf between literary vs linguistic stylistics – Lit crit focuses on

Stylistic analysis • Gulf between literary vs linguistic stylistics – Lit crit focuses on effect on the reader, intended or otherwise, so largely intuitive and subjective – Linguistic stylistics looking for characterisations of style (including literary style) in terms of linguistic phenomena at the various levels of linguistic description 7

Stylistic analysis • Inventory of linguistic devices and their effect – usually in a

Stylistic analysis • Inventory of linguistic devices and their effect – usually in a contrastive way: – in contrast with other writers in a similar genre – in contrast with other genres • Linguistic devices described in terms of the usual linguistic levels of description: phonology, morphology, lexis, grammar, etc. • Effects can be directly expressive, or indirectly, by association – example: onomatopoeia vs alliteration as a phonological device 8

Stylistic analysis Crystal & Davy (1969) Investigating English Style • Informally identify stylistic features

Stylistic analysis Crystal & Davy (1969) Investigating English Style • Informally identify stylistic features felt to be significant • Devise a method of analysis which facilitates comparison between usages • Identify the stylistic function of the features so identified 9

Types of features • “Invariable” features due to the individual or the time –

Types of features • “Invariable” features due to the individual or the time – usually of little interest • Discourse features – medium (= Halliday’s mode), what features distinguish written language from spoken language – participation: eg monologue vs dialogue • Province (= field) lexis and syntax • Status (= tenor) features relating to relative social standing of writer/speaker and reader/listener • Modality (= text type) eg message delivered as a letter, postcard, text message, email, etc • Singularity: deliberate occasional idiosyncracies 10

Method and function • Methods and features determine each other – you can only

Method and function • Methods and features determine each other – you can only measure features that you can extract – simple counting features are easy to extract – more complex features can be extracted thanks to NLP techniques of corpus annotation (tagging, parsing, etc) • Describing the function of observed differences – could be based on intuition – or (see later) partially automated (factor analysis) 11

What to count • Simple things may characterise different styles – average sentence length

What to count • Simple things may characterise different styles – average sentence length – average word length – type: token ratio (vocabulary richness) • number of types = number of different words • number of tokens = total number of words – vocabulary growth (homogeneity of text) • number of new types in 1 st, 2 nd, …, nth 1000 words • in rich varied text, number will climb steadily • Especially when used comparatively 12

What to count • More complex analyses can give a more interesting picture –

What to count • More complex analyses can give a more interesting picture – specific syntactic structures – degree of modification in NPs – types of verbs (eg verbs of persuasion, speech verbs, action verbs, descriptive verbs) – distribution of pronouns (1 st/2 nd/3 rd person) – etc … (anything you can think of) • Quite sophisticated mathematical techniques can give an overall picture – eg factor analysis: identifies from a (big) range of variables which ones best identify/characterize differences 13

Normalization and significance • Always important to compare like with like – It is

Normalization and significance • Always important to compare like with like – It is usual when counting things to “normalize” over the length of the text – If one text is longer than the other, of course you would expect higher frequencies of everything • Issue of statistical significance – Small differences may not really tell you anything – Various measures can confirm whether difference is statistically significant or due to random fluctuation 14

How to count • How to recognize paragraph breaks? • How to recognize sentence

How to count • How to recognize paragraph breaks? • How to recognize sentence breaks? – Headlines don’t end in a fullstop – Not all sentences end in a fullstop – Not all full stops are sentence ending (abbreviations) • How to count words – Hyphenated words, contractions e. g. don’t • How to measure word-length/complexity – – length only roughly corresponds to complexity number of characters vs number of syllables cf. through vs idea counting syllables implies either a dictionary or an algorithm 15

More sophisticated counting • Tagging and parsing allows you to look at grammatical and

More sophisticated counting • Tagging and parsing allows you to look at grammatical and lexical issues – Use of particular POSs (conjunctions, pronouns, auxiliaries, modals) – Use of particular features (tenses, …) – Use of particular constructions (passives, interrogatives) 16

Quantifying register differences • Much work based on corpora trying to quantify and characterize

Quantifying register differences • Much work based on corpora trying to quantify and characterize register differences • Work pioneered by Douglas Biber • Simple counts like the ones suggested • Also, more complex computations 17

Example From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure

Example From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambriufge University Press, 1998. Ch 5: the study of discourse characteristics 18

Multidimensional analysis • Collect a huge range of measures of a wide variety –

Multidimensional analysis • Collect a huge range of measures of a wide variety – some simple word counts – syntactic features – classes and subclasses of N, V, Adj, Avd • Factor analysis 19

20

20

~150 features in all 21

~150 features in all 21

Factor analysis • Statistical method to take large number of apparently random variables and

Factor analysis • Statistical method to take large number of apparently random variables and group them together into “factors” • Factors will be groups of (+ve and –ve) features • Linguist might then try to characterize the factors in terms of some psycholinguistic feature 22

23

23

Example • Biber took two Google classifications of text types: “Home” and “Science” •

Example • Biber took two Google classifications of text types: “Home” and “Science” • Harvested ~1500 webpages in each category (3. 74 m words) – originally got ~2500 webpages, but some were not suitable http: //jan. ucc. nau. edu/biber/Web text types. ppt 24

25

25

Summary of analysis 26

Summary of analysis 26

27

27

28

28