Corpus linguistics gender language change Claire Dembry Robbie
Corpus linguistics, gender & language change Claire Dembry Robbie Love
Corpus linguistics • • • Capturing and analysing large amounts of language data Specialised computer software Writing, or transcripts of speech Usually from ‘real-life’ Millions, billions of words! Explore language, society and underlying linguistic patterns © Cambridge University Press
What can we do? Observations largely based on frequency: • How much does word X occur? • Do people use word X relatively more/less now than before? • Do women use word X relatively more/less than men? • How many times is word X used as a noun vs. as a verb? • How many times does word X refer to a positive meaning vs. a negative meaning? • Which other words are most commonly associated with word X? • Which of two or more variant forms is more common? Often combined with other methods e. g. manual analysis of a sample of the corpus © Cambridge University Press
What can we do? Examples of what these observations can address: • Language teaching • Language testing • Dictionary production • Character presentation in fiction • Language change • Sociolinguistic variation • Dialect variation • Newspaper reporting – bias, assumptions, • (in)accuracy • Social media reactions to major events e. g. • Twitter Public opinion – e. g. of politicians, social • issues • • © Cambridge University Press Detecting criminals online Automated language recognition Language in advertising Etc. !
The Spoken BNC 2014 • We asked hundreds of people around the UK to record themselves chatting with their friends and family • 2012 -2016 • Informal conversation • British English • Mixture of gender, age, region, social class, etc. © Cambridge University Press
The Spoken BNC 2014 • 1, 000 hours of recordings (1, 300 recordings) • More than 650 unique speakers • More than 11 million words © Cambridge University Press
Relative frequency 40 20 Raw frequency 5 5 Relative frequency 0. 125 0. 25 © Cambridge University Press
Keywords • Comparing relative frequency between corpora • For every word in either corpus individually • e. g. “the” (in corpus A) vs. “the” (in corpus B) • e. g. “dog” (in corpus A) vs. “dog” (in corpus B) • Keywords = words which differ in relative frequency to the greatest extent • Reveals difference in lexis between corpora • Helps to characterise the language of one corpus compared to another © Cambridge University Press
Practical • • Keyword analysis in the Spoken BNC 2014 Option 1: Gender • Comparing the language of all the male speakers vs. all the female speakers • Option 2: Age • Comparing the language of all the young (0 -29) speakers vs. all the old (60+) speakers • See handout for instructions! © Cambridge University Press
Thank you © Cambridge University Press cdembry@cambridge. org love. r@cambridgeenglish. org
- Slides: 10