Annotation the scope context speech act Anaphoric reference

  • Slides: 45
Download presentation
Annotation : the scope context speech act Anaphoric reference named entity verb phrase participant

Annotation : the scope context speech act Anaphoric reference named entity verb phrase participant noun phrase er so Steven said it was not a property of um annotated corpora passing truck intonation pattern disfluency

Some xml annotations <person ident=”SB 01” gender=”M”> <birth. Date>12/03/1956</birth. Date>. . </person> <name pers.

Some xml annotations <person ident=”SB 01” gender=”M”> <birth. Date>12/03/1956</birth. Date>. . </person> <name pers. Key=”SB 01”> steven <phon addr=” 1: 10”>st</phon> </name> <phon addr=” 12: 5”>i</phon> <phon addr=” 23: 5”>v</phon> <phon addr=” 30: 2”>n</phon> <w pos=”NP 1”>Steven</w> <u who=”SB 01” start=” 0: 1”>er so steven said it was <emph>not</emph> a property of annotated corpora</u>

Transcribing speech normalization issues ease of reading vs accuracy interpretation vs prosody analogous to

Transcribing speech normalization issues ease of reading vs accuracy interpretation vs prosody analogous to problems of handling digitized images

The Spoken base tagset components : <u> <event> <kinesic> <vocal> <pause> <shift> contextual information

The Spoken base tagset components : <u> <event> <kinesic> <vocal> <pause> <shift> contextual information in header <setting. Desc> <partic. Desc> facilities for synchronization and timing

Features of speech

Features of speech

Utterances Basic unit of discourse, corresponding to speaker turns Optionally grouped into higher-level divisions

Utterances Basic unit of discourse, corresponding to speaker turns Optionally grouped into higher-level divisions (<div>s), e. g. to mark discourse function Linked by who attribute to <person> description in header

Vocals and events Empty elements are used to mark paralinguistic phenomena

Vocals and events Empty elements are used to mark paralinguistic phenomena

Voice quality and prosody The <shift> element is used to mark changes in voice

Voice quality and prosody The <shift> element is used to mark changes in voice quality Other prosodic features may be marked using specific kinds of <seg> or entity refs

Another example

Another example

Participant Description

Participant Description

Setting Description eg from P 2

Setting Description eg from P 2

Timing Pausing use <pause> element Duration use dur attribute Overlap use trans attribute

Timing Pausing use <pause> element Duration use dur attribute Overlap use trans attribute

Overlap

Overlap

Linking, segmentation, alignment Provides generic segmentation elements Provides extensive set of attributes for linkage,

Linking, segmentation, alignment Provides generic segmentation elements Provides extensive set of attributes for linkage, correspondence, synchronization, aggregation, alternation, etc. Documents generic pointing mechanism

Generic segmentation elements • <seg> for arbitrary (nesting) segmentation • <s> for end-to-end segmentation

Generic segmentation elements • <seg> for arbitrary (nesting) segmentation • <s> for end-to-end segmentation use type attribute to subcategorise • <anchor> for points Segmentation is the key to successful linking and analysis

Clustering

Clustering

discontinuous segments fundamental problem first segment, then link, using stand-off

discontinuous segments fundamental problem first segment, then link, using stand-off

discontinuous segments can also use PART attribute to indicate that segments are incomplete

discontinuous segments can also use PART attribute to indicate that segments are incomplete

discontinuous segments

discontinuous segments

Translation pairs <s xml: id="s 1" corresp="#s 2" xml: lang="EN"> For a long time

Translation pairs <s xml: id="s 1" corresp="#s 2" xml: lang="EN"> For a long time I used to go to bed early</s> <s xml: id="s 2" corresp="#s 1" xml: lang="FR"> Longtemps je me couchais de bonne heure</s> and/or. .

Synchronization of whole elements of points in time

Synchronization of whole elements of points in time

XML semantics are limited <s id=”S 1” head=”V 1”> <np id=”N 1”>annotated corpora</np> <vp

XML semantics are limited <s id=”S 1” head=”V 1”> <np id=”N 1”>annotated corpora</np> <vp id=”V 1”>rule</vp> <tq id=”T 1”>okay</tq> </s> The containment relation is implicit, so we do not need to say <vp id=”V 1” part. Of=”S 1”>rule</vp> though we may wish to say <vp id=”V 1” role=”head” >rule</vp>

Analytic mechanisms Specific kinds of segment for linguistic analyses Why is there no tag

Analytic mechanisms Specific kinds of segment for linguistic analyses Why is there no tag for noun? Specialized interpretive pointers (<span> and <span. Grp>) The ana attribute and its possible targets – <interp> and <interp. Grp>

Arbitrary characterizations The <span> points into a stretch of a text and characterizes it

Arbitrary characterizations The <span> points into a stretch of a text and characterizes it in some way Target may be anything you can reach by an xpath

More detailed analysis the ana attribute is of type IDREFS what does VVD identify?

More detailed analysis the ana attribute is of type IDREFS what does VVD identify? a prose description an <interp> element a feature structure

using interp. . . <w ana="#VVD">annotated</w> <w ana="#NN 2">corpora</w>

using interp. . . <w ana="#VVD">annotated</w> <w ana="#NN 2">corpora</w>

hierarchic grouping of interps nouns can be common or proper nouns can be singular

hierarchic grouping of interps nouns can be common or proper nouns can be singular or plural

for example. . . <interp xml: id=‘VVD’> <desc>verb past tense</desc> </interp> <interp xml: id=‘NN

for example. . . <interp xml: id=‘VVD’> <desc>verb past tense</desc> </interp> <interp xml: id=‘NN 2’> <desc>plural common noun</desc> </interp>

Encoding analyses Linguistic Annotation Frameworks and standards the philosophers stone Generic feature structure system

Encoding analyses Linguistic Annotation Frameworks and standards the philosophers stone Generic feature structure system any analysis can be represented by bundles of named feature-value pairs embedded within text or indirectly linked Ancillary feature system declaration Theoretically neutral (? ) pragmatic solution to real world problem of intermachine communication

Feature structures a feature structure consists of a bundle of features a feature has

Feature structures a feature structure consists of a bundle of features a feature has a name and a values may be binary switches, symbols, strings, feature structures, or operations on them bundling may constrained in various (not necessarily hierarchic) ways

. . . or, in XML: ● ● ● The <fs> element represents a

. . . or, in XML: ● ● ● The <fs> element represents a (typed) feature structure, which contains. . . One or more <f> elements, each of which has ● a name ● a value Feature values may be ● atomic: <binary> <string> <numeric> <symbol> ● complex: <fs> <coll> ● expressions: <v. Not> <v. Alt> <v. Coll>. . . or <var>

Using a feature structure. . . <fs xml: id=‘NN 2’> <f name=‘class’> <symbol value=‘noun’/></f>

Using a feature structure. . . <fs xml: id=‘NN 2’> <f name=‘class’> <symbol value=‘noun’/></f> <f name=‘number’> <symbol value=‘plural’/></f> <f name=‘proper’> <binary value=”false”/></f> </fs>

Features: simple values binary, numeric, symbol or string constraints may be declared in FSD

Features: simple values binary, numeric, symbol or string constraints may be declared in FSD

Features: plus or minus <fs type='phonetic segment'> <f name='segment'><binary value=”yes”></f> <f name='consonantal'><binary value=”yes”/></f> <f

Features: plus or minus <fs type='phonetic segment'> <f name='segment'><binary value=”yes”></f> <f name='consonantal'><binary value=”yes”/></f> <f name='vocalic'><binary value=”no”/></f> <f name='nasal'><binary value=”no”/></f> <!--. . -->. <f name='coronal'><binary value=”yes”/></f> <f name='continuant'><binary value=”yes”/></f> <f name='delayed. Release'><binar y value=”yes”/></f> <f name='strident'><binary value=”yes”/></f> segment +, consonantal +, vocalic -, nasal -, low -, high -, back -, round -, anterior +, coronal +, continuant +, delayed release +, strident +]

Alternate values

Alternate values

for example. . . <fs> <f name="cat"> <symbol value="verb"/></f> <f name="aux"> <string value="avoir"/></f> <f

for example. . . <fs> <f name="cat"> <symbol value="verb"/></f> <f name="aux"> <string value="avoir"/></f> <f name=”mode”> <symbolvalue=”indicatif”/></f> <f name="tense"> <symbol value="present"/> </f> <f name="pers"> <v. Alt> <symbol value="1"/> <symbol value="3"/> </v. Alt> </f> <f name="num"> <symbol value="sing"/></f> </fs> “mange”

Value libraries ● ● Collections of re-usable featurestructure components, each with a unique key

Value libraries ● ● Collections of re-usable featurestructure components, each with a unique key May be referenced from an <fs> (using feats attribute) or an <f> (using f. Val attribute) NB effect is to transclude (embed a copy of) the referenced item Not to be confused with. .

for example <f. Lib type="agreement features"> <f xml: id="p 1" name="person"> <symbol value="first"/></f> <f

for example <f. Lib type="agreement features"> <f xml: id="p 1" name="person"> <symbol value="first"/></f> <f xml: id="p 2" name="person"> <symbol value="second"/></f> <!--. . . --> <f xml: id="ns" name="number"> <symbol value="singular"/></f> <f xml: id="np" name="number"> <symbol value="plural"/></f> <!--. . . --> </f. Lib> <fs feats=”#p 2 #ns”/>

Structure sharing ● ● ● Some <fs> are not trees but DAGs – nodes

Structure sharing ● ● ● Some <fs> are not trees but DAGs – nodes may have multiple parents We represent this by labelling each reentrancy point, using a <var> element All <var>s with the same label are held to be the same node: any contents found are to be unified

for example <fs> <f name="nominal"> <fs> <f name="nm-num"> <var label="L 1"> <symbol value="singular"/></var> </f>

for example <fs> <f name="nominal"> <fs> <f name="nm-num"> <var label="L 1"> <symbol value="singular"/></var> </f> <!-- other nominal features --> </fs> </f> <f name="verbal"> <fs> <f name="vb-num"><var label="L 1"/></f> </fs> <!-- other verbal features --> </fs>

Collections and other multiples ● ● The value of a feature may be an

Collections and other multiples ● ● The value of a feature may be an aggregate of atomic values organized as a set, list, or bag We represent this as a <coll> with a distinguishing org attribute The value of a feature may (more usually) be a feature structure. . . or the value of a feature may be given by a feature expression

For <fs> example <f name="lexical. Form"> <symbol value="auxquels"/></f> <f name="analyses"> <coll org="list"> <fs> <f

For <fs> example <f name="lexical. Form"> <symbol value="auxquels"/></f> <f name="analyses"> <coll org="list"> <fs> <f name="cat"><symbol value="prep"/></f> </fs> <f name="cat"><symbol value="pronoun"/></f> <f name="kind"><symbol value="rel"/></f> <f name="num"><symbol value="pl"/></f> <f name="gender"><symbol value="masc"/></f> </fs> </coll> </fs>

Feature expressions ● We provide the following operators – – – ● ● Negation

Feature expressions ● We provide the following operators – – – ● ● Negation <v. Not> i. e. complement Alternation <v. Alt> “Flattening” collection <v. Coll> We also provide a <default> element. . . but some of these are not very useful in the absence of a feature system declaration

Validation of Feature Structures ● Constraints can be applied at three levels – –

Validation of Feature Structures ● Constraints can be applied at three levels – – – ● in the XML schema (e. g. empty <f> is not allowed) by supplying additional rules in an established XML constraint language (e. g. Schematron) by defining a complete FSD or equivalent Or, a given set of <fs> could be “deabstracted” to form a structure for which a specific schema could be written

“de-abstractification” ● A generic XML representation can be automatically converted to a specific one.

“de-abstractification” ● A generic XML representation can be automatically converted to a specific one. . . <fs type=”ABC”> <f name=”xyz”> <symbol value=”zzz”/></f> <f name=”foo”> <numeric value=” 42”/></f> </fs> <!ELEMENT ABC (xyz, foo)> <!ELEMENT xyz (#PCDATA)> <!ELEMENT foo (#PCDATA)> <ABC> <xyz>zzz</xy z> <foo>42</foo >