Given an annotated corpus Using annotated corpora to
- Slides: 27
Given an annotated corpus… Using annotated corpora to study syntactic variation and change Ann Taylor University of York (UK)
Outline • Introduction to our series of syntactically annotated corpora of earlier stages of English • Illustration of the kind of research that can be done with these corpora that couldn’t be done without them
Syntactically annotated corpora of earlier stages of English • The York-Toronto-Helsinki Parsed Corpus of Old English Prose (Taylor et al, 2003) • The Penn-Helsinki Parsed Corpus of Middle English II (Kroch and Taylor, 2000) • The Penn-Helsinki Parsed Corpus of Early Modern English (Kroch et al, 2005) • The Parsed Corpus of Early English Correspondence (Taylor et al, 2006) • The Penn Parsed Corpus of Modern British English (Kroch et al, in progress)
Corpus Period Word Count YCOE c. 800 -1100 1, 452, 086 PPCME 2 1125 -1500 1, 155, 965 PPCEME 1500 -1710 1, 657, 058 PCEEC 1410 -1700 2, 162, 134 Total PPCMBE 6, 427, 243 1710 -1914 3, 000
Audience • The corpora are intended primarily to support quantitative work in language variation and change • Goals – Easy to access structures not just lexis or part of speech – Large enough to generate valid statistics – Sufficient coverage to be able to trace changes over time
The annotation system • A modified Penn Treebank scheme – Cosmetic changes • Nodes are given labels more familiar to generative linguists – Major changes • No VP • Function is marked on a wider range of sentential and NP nodes, but not on PPs
( (IP-MAT (CONJ and) (NP-SBJ (PRO I)) (BEP am) (ADJP (ADJ sure) (CP-THT (C 0) (IP-SUB (NP-SBJ (PRO I)) (MD shall) (VB desyre) (NP-OB 1 (PRO it))))) (PP (P+N because) (CP-ADV (C 0) (IP-SUB (NP-SBJ (PRO you)) (BEP are) (ADVP-LOC (ADV there))))) (. . )) (ID OSBORNE, 5. 002. 40))
Old English ( (IP-MAT (CONJ ac) (NP-NOM (PRO^N he)) (VBD bediglode) (ADVP (ADV swa) (ADV +teah)) (NP-ACC (PRO$ his) (N^A d+ada)) (NP-DAT (D^D +tam) (N^D casere) (NP-DAT-PRN (NR^D Dioclitiane)) (CP-REL (WNP-NOM-1 (D^N se)) (C 0) (IP-SUB (NP-NOM *T*-1) (BEDI w+as) (NP-NOM-PRD (NP-GEN (N^G deofles)) (N^N biggencga))))) (. . )) (ID coaelive, +ALS_[Sebastian]: 8. 1215))
Correspondence Corpus ( (METADATA (AUTHOR BRIAN_DUPPA: MALE: FRIEND: 1589: 61) (RECIPIENT JUSTINIAN_ISHAM: MALE: FRIEND: 1611: 39) (LETTER DUPPA_001: E 3: 1650: AUTOGRAPH: FRIEND)) (IP-IMP (IP-MAT-PRN (NP-SBJ (PRO I)) (VBP pray)) (VBI putt) (NP-OB 1 (PRO it)) (PP (P upon) (NP (PRO$ your) (N score))) (. , )) (ID DUPPA, 4. 001. 13))
Searching the corpora with Corpus. Search • Searches structures using dominance and precedence relations • Generates statistics • Can search its own output
Variation in verb-object order in Old and Middle English (Pintzuk & Taylor 2006)
Verb-object order in Old English Ac he sceal pa sacfullan gesibbian But he must the quarrelsome reconcile ‘But he must reconcile the quarrelsome. . . ’ (colwstan 1, +ALet_2_[Wulfstan_1]: 188. 256) Se wolde gelytlian pone lyfigendan hælend He would diminish the living lord ‘He would diminish the living lord. . . ’ (colwstan 1, +ALet_2_[Wulfstan_1]: 55. 98)
Verb-object order in Middle English ear he hefde his ranceun fulleliche ipaizet before he had his ransom fully paid ‘Before he had fully paid his ransom. . . ’ (CMANCRIW, II. 101. 1228) zef pu wult habben bricht sichde wid pine heorte echnen if you will have bright sight with your heart’s eyes ‘If you will have bright sight with your heart’s eyes. . . ’ (CMANCRIW, II. 73. 839)
• The question: what factors affect object position in OE and ME? • The data: ~10, 000 tokens containing a medial auxiliary, a non-finite verb and an object
Factors affecting object position • Date of text • Length of object • Type of object
Type of object • Quantified • Negative • Positive (non-negative, nonquantified)
Quantified objects (Middle English) zef ze habbed ani god don if you have any good done ‘. . . if you have done any good. . . ’ (CMANCRIW, I. 76. 310) fordon pe he scal azein zeuen awiht for he shall again give something ‘. . . for he shall again give something. ’ (CMLAMBX 1, 31. 396)
Negative objects (Middle English) pt he ne mai nan ping don us buten godes leaue that he neg can no thing do us without God’s leave ‘. . . that he can do nothing to us without God’s leave. ’ (CMANCRIW, II. 169. 2346) swa pet ho ne scal of pere wunde habbe nan oder uuel so that she neg shall from her wound have no other evil ‘. . . so that she shall have no other evil from her wound. ’ (CMLAMB 1, 83. 195)
Syntactic variation in PDE • Heavy-NP shift (Wasow & Arnold) • Dative alternation (Wasow, Bresnan) • Particle shift (Gries) • Saxon vs. of-genitive (Szmrecsanyi) • Complementizer omission in complement and relative clauses (Jaeger & Wasow) • Topicalization, etc. (Cresswell)
Conclusions • The study of syntactic variation is an up and coming topic in linguistics • It can’t be studied using the usual methods (introspection, intuition) but requires naturally occurring data • Text corpora are only so useful for this • To study syntactic variation efficiently, you really need annotated data, and the more the better
- What is corpus
- Lymphatic drainage of breast
- Thick stroma in ovary
- Help
- Grain de suie cheval
- Corpora amylacea
- Meninge
- Cranial nerve number face
- Corpus types
- Optička hijazma
- Arachnoid mater sheep brain
- Spaces of meninges
- Corpora
- Brain dura
- Corpora quadrigemina pronunciation
- Sentence expressing purpose
- Given the circuit below, find vo using nodal analysis.
- Dtfd switch
- Using system.collections.generic
- Mellkasi szervek
- Striatum function
- Sulcus basilaris
- Psiformis
- Corpus presentation
- Concordance lines
- Tinia corpus
- Z-brain
- La ley es dura pero es la ley