Given an annotated corpus Using annotated corpora to

  • Slides: 27
Download presentation
Given an annotated corpus… Using annotated corpora to study syntactic variation and change Ann

Given an annotated corpus… Using annotated corpora to study syntactic variation and change Ann Taylor University of York (UK)

Outline • Introduction to our series of syntactically annotated corpora of earlier stages of

Outline • Introduction to our series of syntactically annotated corpora of earlier stages of English • Illustration of the kind of research that can be done with these corpora that couldn’t be done without them

Syntactically annotated corpora of earlier stages of English • The York-Toronto-Helsinki Parsed Corpus of

Syntactically annotated corpora of earlier stages of English • The York-Toronto-Helsinki Parsed Corpus of Old English Prose (Taylor et al, 2003) • The Penn-Helsinki Parsed Corpus of Middle English II (Kroch and Taylor, 2000) • The Penn-Helsinki Parsed Corpus of Early Modern English (Kroch et al, 2005) • The Parsed Corpus of Early English Correspondence (Taylor et al, 2006) • The Penn Parsed Corpus of Modern British English (Kroch et al, in progress)

Corpus Period Word Count YCOE c. 800 -1100 1, 452, 086 PPCME 2 1125

Corpus Period Word Count YCOE c. 800 -1100 1, 452, 086 PPCME 2 1125 -1500 1, 155, 965 PPCEME 1500 -1710 1, 657, 058 PCEEC 1410 -1700 2, 162, 134 Total PPCMBE 6, 427, 243 1710 -1914 3, 000

Audience • The corpora are intended primarily to support quantitative work in language variation

Audience • The corpora are intended primarily to support quantitative work in language variation and change • Goals – Easy to access structures not just lexis or part of speech – Large enough to generate valid statistics – Sufficient coverage to be able to trace changes over time

The annotation system • A modified Penn Treebank scheme – Cosmetic changes • Nodes

The annotation system • A modified Penn Treebank scheme – Cosmetic changes • Nodes are given labels more familiar to generative linguists – Major changes • No VP • Function is marked on a wider range of sentential and NP nodes, but not on PPs

( (IP-MAT (CONJ and) (NP-SBJ (PRO I)) (BEP am) (ADJP (ADJ sure) (CP-THT (C

( (IP-MAT (CONJ and) (NP-SBJ (PRO I)) (BEP am) (ADJP (ADJ sure) (CP-THT (C 0) (IP-SUB (NP-SBJ (PRO I)) (MD shall) (VB desyre) (NP-OB 1 (PRO it))))) (PP (P+N because) (CP-ADV (C 0) (IP-SUB (NP-SBJ (PRO you)) (BEP are) (ADVP-LOC (ADV there))))) (. . )) (ID OSBORNE, 5. 002. 40))

Old English ( (IP-MAT (CONJ ac) (NP-NOM (PRO^N he)) (VBD bediglode) (ADVP (ADV swa)

Old English ( (IP-MAT (CONJ ac) (NP-NOM (PRO^N he)) (VBD bediglode) (ADVP (ADV swa) (ADV +teah)) (NP-ACC (PRO$ his) (N^A d+ada)) (NP-DAT (D^D +tam) (N^D casere) (NP-DAT-PRN (NR^D Dioclitiane)) (CP-REL (WNP-NOM-1 (D^N se)) (C 0) (IP-SUB (NP-NOM *T*-1) (BEDI w+as) (NP-NOM-PRD (NP-GEN (N^G deofles)) (N^N biggencga))))) (. . )) (ID coaelive, +ALS_[Sebastian]: 8. 1215))

Correspondence Corpus ( (METADATA (AUTHOR BRIAN_DUPPA: MALE: FRIEND: 1589: 61) (RECIPIENT JUSTINIAN_ISHAM: MALE: FRIEND:

Correspondence Corpus ( (METADATA (AUTHOR BRIAN_DUPPA: MALE: FRIEND: 1589: 61) (RECIPIENT JUSTINIAN_ISHAM: MALE: FRIEND: 1611: 39) (LETTER DUPPA_001: E 3: 1650: AUTOGRAPH: FRIEND)) (IP-IMP (IP-MAT-PRN (NP-SBJ (PRO I)) (VBP pray)) (VBI putt) (NP-OB 1 (PRO it)) (PP (P upon) (NP (PRO$ your) (N score))) (. , )) (ID DUPPA, 4. 001. 13))

Searching the corpora with Corpus. Search • Searches structures using dominance and precedence relations

Searching the corpora with Corpus. Search • Searches structures using dominance and precedence relations • Generates statistics • Can search its own output

Variation in verb-object order in Old and Middle English (Pintzuk & Taylor 2006)

Variation in verb-object order in Old and Middle English (Pintzuk & Taylor 2006)

Verb-object order in Old English Ac he sceal pa sacfullan gesibbian But he must

Verb-object order in Old English Ac he sceal pa sacfullan gesibbian But he must the quarrelsome reconcile ‘But he must reconcile the quarrelsome. . . ’ (colwstan 1, +ALet_2_[Wulfstan_1]: 188. 256) Se wolde gelytlian pone lyfigendan hælend He would diminish the living lord ‘He would diminish the living lord. . . ’ (colwstan 1, +ALet_2_[Wulfstan_1]: 55. 98)

Verb-object order in Middle English ear he hefde his ranceun fulleliche ipaizet before he

Verb-object order in Middle English ear he hefde his ranceun fulleliche ipaizet before he had his ransom fully paid ‘Before he had fully paid his ransom. . . ’ (CMANCRIW, II. 101. 1228) zef pu wult habben bricht sichde wid pine heorte echnen if you will have bright sight with your heart’s eyes ‘If you will have bright sight with your heart’s eyes. . . ’ (CMANCRIW, II. 73. 839)

 • The question: what factors affect object position in OE and ME? •

• The question: what factors affect object position in OE and ME? • The data: ~10, 000 tokens containing a medial auxiliary, a non-finite verb and an object

Factors affecting object position • Date of text • Length of object • Type

Factors affecting object position • Date of text • Length of object • Type of object

Type of object • Quantified • Negative • Positive (non-negative, nonquantified)

Type of object • Quantified • Negative • Positive (non-negative, nonquantified)

Quantified objects (Middle English) zef ze habbed ani god don if you have any

Quantified objects (Middle English) zef ze habbed ani god don if you have any good done ‘. . . if you have done any good. . . ’ (CMANCRIW, I. 76. 310) fordon pe he scal azein zeuen awiht for he shall again give something ‘. . . for he shall again give something. ’ (CMLAMBX 1, 31. 396)

Negative objects (Middle English) pt he ne mai nan ping don us buten godes

Negative objects (Middle English) pt he ne mai nan ping don us buten godes leaue that he neg can no thing do us without God’s leave ‘. . . that he can do nothing to us without God’s leave. ’ (CMANCRIW, II. 169. 2346) swa pet ho ne scal of pere wunde habbe nan oder uuel so that she neg shall from her wound have no other evil ‘. . . so that she shall have no other evil from her wound. ’ (CMLAMB 1, 83. 195)

Syntactic variation in PDE • Heavy-NP shift (Wasow & Arnold) • Dative alternation (Wasow,

Syntactic variation in PDE • Heavy-NP shift (Wasow & Arnold) • Dative alternation (Wasow, Bresnan) • Particle shift (Gries) • Saxon vs. of-genitive (Szmrecsanyi) • Complementizer omission in complement and relative clauses (Jaeger & Wasow) • Topicalization, etc. (Cresswell)

Conclusions • The study of syntactic variation is an up and coming topic in

Conclusions • The study of syntactic variation is an up and coming topic in linguistics • It can’t be studied using the usual methods (introspection, intuition) but requires naturally occurring data • Text corpora are only so useful for this • To study syntactic variation efficiently, you really need annotated data, and the more the better