Annotation the scope context speech act Anaphoric reference
- Slides: 45
Annotation : the scope context speech act Anaphoric reference named entity verb phrase participant noun phrase er so Steven said it was not a property of um annotated corpora passing truck intonation pattern disfluency
Some xml annotations <person ident=”SB 01” gender=”M”> <birth. Date>12/03/1956</birth. Date>. . </person> <name pers. Key=”SB 01”> steven <phon addr=” 1: 10”>st</phon> </name> <phon addr=” 12: 5”>i</phon> <phon addr=” 23: 5”>v</phon> <phon addr=” 30: 2”>n</phon> <w pos=”NP 1”>Steven</w> <u who=”SB 01” start=” 0: 1”>er so steven said it was <emph>not</emph> a property of annotated corpora</u>
Transcribing speech normalization issues ease of reading vs accuracy interpretation vs prosody analogous to problems of handling digitized images
The Spoken base tagset components : <u> <event> <kinesic> <vocal> <pause> <shift> contextual information in header <setting. Desc> <partic. Desc> facilities for synchronization and timing
Features of speech
Utterances Basic unit of discourse, corresponding to speaker turns Optionally grouped into higher-level divisions (<div>s), e. g. to mark discourse function Linked by who attribute to <person> description in header
Vocals and events Empty elements are used to mark paralinguistic phenomena
Voice quality and prosody The <shift> element is used to mark changes in voice quality Other prosodic features may be marked using specific kinds of <seg> or entity refs
Another example
Participant Description
Setting Description eg from P 2
Timing Pausing use <pause> element Duration use dur attribute Overlap use trans attribute
Overlap
Linking, segmentation, alignment Provides generic segmentation elements Provides extensive set of attributes for linkage, correspondence, synchronization, aggregation, alternation, etc. Documents generic pointing mechanism
Generic segmentation elements • <seg> for arbitrary (nesting) segmentation • <s> for end-to-end segmentation use type attribute to subcategorise • <anchor> for points Segmentation is the key to successful linking and analysis
Clustering
discontinuous segments fundamental problem first segment, then link, using stand-off
discontinuous segments can also use PART attribute to indicate that segments are incomplete
discontinuous segments
Translation pairs <s xml: id="s 1" corresp="#s 2" xml: lang="EN"> For a long time I used to go to bed early</s> <s xml: id="s 2" corresp="#s 1" xml: lang="FR"> Longtemps je me couchais de bonne heure</s> and/or. .
Synchronization of whole elements of points in time
XML semantics are limited <s id=”S 1” head=”V 1”> <np id=”N 1”>annotated corpora</np> <vp id=”V 1”>rule</vp> <tq id=”T 1”>okay</tq> </s> The containment relation is implicit, so we do not need to say <vp id=”V 1” part. Of=”S 1”>rule</vp> though we may wish to say <vp id=”V 1” role=”head” >rule</vp>
Analytic mechanisms Specific kinds of segment for linguistic analyses Why is there no tag for noun? Specialized interpretive pointers (<span> and <span. Grp>) The ana attribute and its possible targets – <interp> and <interp. Grp>
Arbitrary characterizations The <span> points into a stretch of a text and characterizes it in some way Target may be anything you can reach by an xpath
More detailed analysis the ana attribute is of type IDREFS what does VVD identify? a prose description an <interp> element a feature structure
using interp. . . <w ana="#VVD">annotated</w> <w ana="#NN 2">corpora</w>
hierarchic grouping of interps nouns can be common or proper nouns can be singular or plural
for example. . . <interp xml: id=‘VVD’> <desc>verb past tense</desc> </interp> <interp xml: id=‘NN 2’> <desc>plural common noun</desc> </interp>
Encoding analyses Linguistic Annotation Frameworks and standards the philosophers stone Generic feature structure system any analysis can be represented by bundles of named feature-value pairs embedded within text or indirectly linked Ancillary feature system declaration Theoretically neutral (? ) pragmatic solution to real world problem of intermachine communication
Feature structures a feature structure consists of a bundle of features a feature has a name and a values may be binary switches, symbols, strings, feature structures, or operations on them bundling may constrained in various (not necessarily hierarchic) ways
. . . or, in XML: ● ● ● The <fs> element represents a (typed) feature structure, which contains. . . One or more <f> elements, each of which has ● a name ● a value Feature values may be ● atomic: <binary> <string> <numeric> <symbol> ● complex: <fs> <coll> ● expressions: <v. Not> <v. Alt> <v. Coll>. . . or <var>
Using a feature structure. . . <fs xml: id=‘NN 2’> <f name=‘class’> <symbol value=‘noun’/></f> <f name=‘number’> <symbol value=‘plural’/></f> <f name=‘proper’> <binary value=”false”/></f> </fs>
Features: simple values binary, numeric, symbol or string constraints may be declared in FSD
Features: plus or minus <fs type='phonetic segment'> <f name='segment'><binary value=”yes”></f> <f name='consonantal'><binary value=”yes”/></f> <f name='vocalic'><binary value=”no”/></f> <f name='nasal'><binary value=”no”/></f> <!--. . -->. <f name='coronal'><binary value=”yes”/></f> <f name='continuant'><binary value=”yes”/></f> <f name='delayed. Release'><binar y value=”yes”/></f> <f name='strident'><binary value=”yes”/></f> segment +, consonantal +, vocalic -, nasal -, low -, high -, back -, round -, anterior +, coronal +, continuant +, delayed release +, strident +]
Alternate values
for example. . . <fs> <f name="cat"> <symbol value="verb"/></f> <f name="aux"> <string value="avoir"/></f> <f name=”mode”> <symbolvalue=”indicatif”/></f> <f name="tense"> <symbol value="present"/> </f> <f name="pers"> <v. Alt> <symbol value="1"/> <symbol value="3"/> </v. Alt> </f> <f name="num"> <symbol value="sing"/></f> </fs> “mange”
Value libraries ● ● Collections of re-usable featurestructure components, each with a unique key May be referenced from an <fs> (using feats attribute) or an <f> (using f. Val attribute) NB effect is to transclude (embed a copy of) the referenced item Not to be confused with. .
for example <f. Lib type="agreement features"> <f xml: id="p 1" name="person"> <symbol value="first"/></f> <f xml: id="p 2" name="person"> <symbol value="second"/></f> <!--. . . --> <f xml: id="ns" name="number"> <symbol value="singular"/></f> <f xml: id="np" name="number"> <symbol value="plural"/></f> <!--. . . --> </f. Lib> <fs feats=”#p 2 #ns”/>
Structure sharing ● ● ● Some <fs> are not trees but DAGs – nodes may have multiple parents We represent this by labelling each reentrancy point, using a <var> element All <var>s with the same label are held to be the same node: any contents found are to be unified
for example <fs> <f name="nominal"> <fs> <f name="nm-num"> <var label="L 1"> <symbol value="singular"/></var> </f> <!-- other nominal features --> </fs> </f> <f name="verbal"> <fs> <f name="vb-num"><var label="L 1"/></f> </fs> <!-- other verbal features --> </fs>
Collections and other multiples ● ● The value of a feature may be an aggregate of atomic values organized as a set, list, or bag We represent this as a <coll> with a distinguishing org attribute The value of a feature may (more usually) be a feature structure. . . or the value of a feature may be given by a feature expression
For <fs> example <f name="lexical. Form"> <symbol value="auxquels"/></f> <f name="analyses"> <coll org="list"> <fs> <f name="cat"><symbol value="prep"/></f> </fs> <f name="cat"><symbol value="pronoun"/></f> <f name="kind"><symbol value="rel"/></f> <f name="num"><symbol value="pl"/></f> <f name="gender"><symbol value="masc"/></f> </fs> </coll> </fs>
Feature expressions ● We provide the following operators – – – ● ● Negation <v. Not> i. e. complement Alternation <v. Alt> “Flattening” collection <v. Coll> We also provide a <default> element. . . but some of these are not very useful in the absence of a feature system declaration
Validation of Feature Structures ● Constraints can be applied at three levels – – – ● in the XML schema (e. g. empty <f> is not allowed) by supplying additional rules in an established XML constraint language (e. g. Schematron) by defining a complete FSD or equivalent Or, a given set of <fs> could be “deabstracted” to form a structure for which a specific schema could be written
“de-abstractification” ● A generic XML representation can be automatically converted to a specific one. . . <fs type=”ABC”> <f name=”xyz”> <symbol value=”zzz”/></f> <f name=”foo”> <numeric value=” 42”/></f> </fs> <!ELEMENT ABC (xyz, foo)> <!ELEMENT xyz (#PCDATA)> <!ELEMENT foo (#PCDATA)> <ABC> <xyz>zzz</xy z> <foo>42</foo >
- Endophoric reference
- Anaphoric marketing
- Anaphoric
- Antecedent anaphora
- Macbeth summary
- Process scope definition
- Use case diagram
- High context vs low context culture ppt
- Communicating across generational differences
- Examples of pragmatics
- Verbal adalah
- I will embrace only the sun poem analysis
- Reference node and non reference node
- Reference node and non reference node
- Close reading symbols
- Poetry
- Gcse photography annotation
- Maker genome annotation
- Is macbeth duncan's cousin
- Acquainted with the night annotation
- Annotation guide
- Bacteriophage annotation
- Acquainted with the night annotation
- Donne valediction
- What is annotations
- Living space poem annotated
- Maplex label engine
- Eukaryotic genome
- Yesterday by patricia pogson
- David gene id conversion
- Annotation slows down the reader to deepen understanding
- Annotation slows down the reader to deepen understanding
- Alr law reports
- Genome assembly
- Amazon data annotation
- Attitude of nothing gold can stay
- What is text annotation
- The two terms of comparison in the first two quatrains are
- Amazon data annotation
- Ovid metamorphoses icarus
- Maxims of annotation in corpus linguistics
- The santa ana winds joan didion annotation
- Annotation toolkit
- Poetry learning intentions
- Baskin shark poem
- Benjamin banneker letter to thomas jefferson annotation