Corpus analysis 2 Corpus Linguistics Richard Xiao lancsxiaozgooglemail

  • Slides: 42
Download presentation
Corpus analysis (2) Corpus Linguistics Richard Xiao lancsxiaoz@googlemail. com

Corpus analysis (2) Corpus Linguistics Richard Xiao lancsxiaoz@googlemail. com

Outline of the session • Lecture – Keyword – Reference corpus – Key keyword

Outline of the session • Lecture – Keyword – Reference corpus – Key keyword • Practical – WST keyword – Ant. Conc keyword – Wmatrix keyword / key concept – Extra: keyword analysis with CQPweb 2

What is a keyword? • Keywords are those words whose frequency is exceptionally high

What is a keyword? • Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus – Keywords usually refer to positive keywords – But negative keywords are equally interesting (see Xiao and Mc. Enery 2005) • They appear at the very end of your listing, in a different colour in Word. Smith • They are omitted automatically from a keywords database for keyword analysis and a keyword plot 3

Why keyword analysis? • Indicating the ‘aboutness’ (Scott 1999) of a particular text or

Why keyword analysis? • Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus – Contents analysis, discourse analysis • Also revealing the salient features which are functionally related to a particular genre (Xiao and Mc. Enery 2005) – Genre analysis, stylistic analysis 4

How to do keyword analysis • Make a wordlist of the target corpus •

How to do keyword analysis • Make a wordlist of the target corpus • Locate or make a word list of a reference corpus – Scott (2005) “In search of a bad reference corpus” • http: //www. methodsnetwork. ac. uk/redist/pdf/es 1_05 scott. pdf – The reference corpus is usually larger than the target corpus – The appropriateness of a reference corpus depends on your research questions! • Compare the frequency of each item in the two wordlists to extract keywords – done automatically • Analyse and interpret keywords – you will do it! 5

Keywords in the party speeches • Target corpus – just one text – David

Keywords in the party speeches • Target corpus – just one text – David Cameron's speech at the Conservative conference (10 October 2012, Manchester) • http: //www. bbc. co. uk/news/uk-politics-15189614 • Local copy available (David_speech Unicode text) download and unzip the file into a file folder: www. fass. lancs. ac. uk/projects/corpus/data/workshop 3 texts. zip • Reference corpus – The 100 -million-word BNC: download and unzip (local copy available) www. lexically. net/downloads/version 4/BNC_World. zip • Tool – WST Keyword 6

Wordlist of David’s speech 7

Wordlist of David’s speech 7

Creating keyword list 8

Creating keyword list 8

Keyword extraction in progress Warning: It can take time if you have loaded two

Keyword extraction in progress Warning: It can take time if you have loaded two large wordlists 9

Keywords in David’s speech What do these keywords tell us? Negative keyword 10

Keywords in David’s speech What do these keywords tell us? Negative keyword 10

Keyword: Plot view 11

Keyword: Plot view 11

What companies do keywords keep? 12

What companies do keywords keep? 12

Why “marriage”? 13

Why “marriage”? 13

Key clusters Similar to word clusters, but only keywords are used. 14

Key clusters Similar to word clusters, but only keywords are used. 14

Key keywords • A keyword is one which is "key" in more than one

Key keywords • A keyword is one which is "key" in more than one of a number of related texts – The more texts it is "key" in, the more "key key" it is – Can avoid extracting keywords which are unusually frequent in only a small number of files • Can be created automatically and as simple to extract as you do for keywords • n. b. Negative keywords are omitted automatically from a keyword list 15

Making a batch wordlist Specify a folder where you can write 16

Making a batch wordlist Specify a folder where you can write 16

Batch making keyword lists 17

Batch making keyword lists 17

Batch making keyword lists Specify a folder where you can write 18

Batch making keyword lists Specify a folder where you can write 18

Making a KW database 19

Making a KW database 19

Key keywords key coverage of the corpus An "associate" is a keyword that appears

Key keywords key coverage of the corpus An "associate" is a keyword that appears in the same text 20

Keyword in Ant. Conc target corpus reference corpus 21

Keyword in Ant. Conc target corpus reference corpus 21

Keyword in Ant. Conc Key words in David's speech (in relation to Ed's speech)

Keyword in Ant. Conc Key words in David's speech (in relation to Ed's speech) 22

Wmatrix: Keywords and key concepts • POS and semantic tagging • Keyword / key

Wmatrix: Keywords and key concepts • POS and semantic tagging • Keyword / key concept analysis in Cameron’s speech in comparison with Miliband’s speech • Copy and paste the speeches into two separate text files – http: //www. bbc. co. uk/news/uk-politics-15189614 – http: //www. labour. org. uk/ed-milibands-speech-tolabour-party-conference • Save the two texts as David_speech. txt and Ed_speech. txt www. fass. lancs. ac. uk/projects/corpus/data/workshop 3 texts. zip 23

Wmatrix: Keywords and key concepts • Login with your account using zhejiangxx account –

Wmatrix: Keywords and key concepts • Login with your account using zhejiangxx account – http: //ucrel. lancs. ac. uk/wmatrix 3. html 24

Tagging Wizard 25

Tagging Wizard 25

Tagging in progress 26

Tagging in progress 26

Tagging result 27

Tagging result 27

Labour frequency list 28

Labour frequency list 28

KWIC concordance 29

KWIC concordance 29

“My folders” Upload and tag Ed’s speech …and click on “My folders” Warning: Your

“My folders” Upload and tag Ed’s speech …and click on “My folders” Warning: Your folder view may look different! 30

Open David_speech folder and select Ed_speech in “Keyword compared to” dropdown box 31

Open David_speech folder and select Ed_speech in “Keyword compared to” dropdown box 31

Keyword list to download! 32

Keyword list to download! 32

Keyword cloud – even more interesting! 33

Keyword cloud – even more interesting! 33

David’s key concepts (“Key concepts compared to”) 34

David’s key concepts (“Key concepts compared to”) 34

Keyword analysis in online corpora • Using Lancaster’s CQPweb to compare British English (LOB+FLOB)

Keyword analysis in online corpora • Using Lancaster’s CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown) • Login CQPweb – http: //cqpweb. lancs. ac. uk • Similar analysis can be done at BSFU’s CQPweb corpus hub (different corpora) – http: //124. 193. 83. 252/cqp/ – Account: ID=pass=test 35

Creating subcorpora 36

Creating subcorpora 36

Creating subcorpus Br. E 37

Creating subcorpus Br. E 37

Creating subcorpus Am. E 38

Creating subcorpus Am. E 38

Making wordlists 39

Making wordlists 39

Wordlist available now 40

Wordlist available now 40

Computing keywords You can make adjustments to the statistical measure, cut-off point, and minimum

Computing keywords You can make adjustments to the statistical measure, cut-off point, and minimum frequency according your research purposes. 41

Keywords in Br. E and Am. E 42

Keywords in Br. E and Am. E 42