Using large online corpora for language teaching and
- Slides: 12
Using large online corpora for language teaching and learning Mark Davies Brigham Young University Provo, Utah, USA http: //corpus. byu. edu GDUFS / December 2014
Review of COCA Grammar: must V, end up V-ing, get V-ed, try and V Collocates: break, brooding, bodice, cause, visibly Concordance lines: budge, diametrically Synonyms: strong, utter / sheer Frequency: a lot of, attitudinal, somewhat ADJ
Finding the right word • “potent” argument • “tough” regulations Synonym chains: • precarious
www. Word. And. Phrase. info • Browse frequency lists (1 – 60, 000) • Input and analyze texts • Does much of what COCA does, but all on one page – Frequency (by genre) – Definition – Collocates – Concordance lines – Synonyms
www. word. And. Phrase. info Overview • Frequency (help pages) • Enter texts (saved: fitness, vent cap, fiction) • Academic: frequency (help pages) • Academic: enter texts (saved: Sci, Med) • synonyms; e. g. range ~2600: incorporate, assign, valuable, classic, assumption, barrier
Sample text 1 China/US climate change accord http: //corpus. byu. edu/texts/climate. Change. html • • Keywords mitigate (all) greenhouse (concordances) devastating (synonyms) ire (draw as collocate) emissions (collocates; click) a “considerable” amount (other options) somewhat [ADJ] (genres)
Sample text 2 http: //www. cbsnews. com/news/comet-philae-lander-survive-we-need-to-be-very-lucky/ http: //corpus. byu. edu/texts/comet. Lander. html • keywords • forestall, rouse, jumble, illuminated (all) • precisely, secondary (genres)
Sample text 3 • • http: //www. zdnet. com/android-lollipop-users-warn-of-unusable-devices-after-upgrading 7000035977/ http: //corpus. byu. edu/texts/android. Devices. html
Sample compositions • #1 – various competition – a lot of – no matter what • #4 – strongly advocate for (ok) – advocate that (? ) – opportunities * people
Sample texts: Wikipedia • • fungus internal combustion engine Battle of Gettysburg Yao Ming
The Wikipedia Corpus • Almost done: December 2014 • 1. 9 billion words, 4. 6 million articles • Can quickly and easily create personalized, virtual corpora (e. g. electrical engineering, biology, automobiles, finance, Star Trek) • Search within corpora • Compare across corpora • Create keywords
The Wikipedia Corpus • • • Sample: [investment] (words) Sample: buddh* (words) Sample: biology (titles) Sample: audi porsche (titles) Search within corpus: biology (collocates cell) Compare frequency across corpora (studies)