Solutions mentioned by the TEI CONCUR an optional
Solutions mentioned by the TEI · CONCUR: an optional feature of SGML (not XML) that allows multiple hierarchies to be marked up concurrently in the same document · milestone elements: empty elements that mark the boundaries between elements in a non-nesting structure · fragmentation of an item: the division of a single element into two or more parts, each of which nests properly within its context · virtual joins: the re-creation of a virtual element from fragments of text · redundant encoding: information encoded in multiple forms Sekimo http: //www. text-technology. de/projects/sekimo. html
Problems with milestones · milestones are empty elements ® milestones elements have no content · consequences: Ø no content model restriction can be stated by a document grammar Ø standard SG/XML editors cannotate these regions Ø SG/XML parsers cannot ensure proper nesting of the milestone elements Ø to process these regions by means of a style sheet is Ø more difficult (XSLT) or Ø impossible (CSS) Sekimo http: //www. text-technology. de/projects/sekimo. html
CLIX/Horse-milestones · Differing type of milestones <milestone type=’start’ gi=’q’ id=’foo’/> … <milestone type=’end’ gi=’q’ coid=’foo’/> <start gi=’q’ id=’foo’/>. . . <end gi=’q’ coid=’foo’/> · CLIX Non-XML: <B>s<I>xyz</B>t</I> Would be : <B s. ID=’ 1’/>b<I s. ID=’ 2’/>xyz<B e. ID=’ 1’/>t<I e. ID=’ 2’/> Sekimo http: //www. text-technology. de/projects/sekimo. html
Problems with the other TEI-solutions · CONCUR: Ø (de facto) not implemented (and not part of XML) · fragmentation of an item: Ø results in 'containers' containing only a part of the text, e. g. a fragmented sentence or para would not contain an entire sentence or paragraph, as implied · virtual joins: Ø requires a separate interpretation of the SGML document · redundant encoding: Ø results in multiple files ® the files are not integrated in a larger unit ® it exists no unit containing all the information Sekimo http: //www. text-technology. de/projects/sekimo. html
Stand-off annotation · new layers of annotation are added by building a new tree whose nodes are SGML elements which do not contain textual content, but links to another layer · in some respects a generalization of the virtual joins (although not mentioned by the TEI), because Ø not only contents of elements are joined, but also ranges between points within the document · link base: Ø Distinction 1: markup already contained in an annotation layer vs. text content, addressed by character offsets Ø Distinction 2: one (dedicated) layer as the link target vs. (free) interlinking of several layers Sekimo http: //www. text-technology. de/projects/sekimo. html
Advantages of stand-off annotation · Thompson & Mc. Kelvie (1997) Ø the source document might be read-only Ø annotation files can be distributed without distributing the source text · Michael Glass & Barbara Di Eugenio (2002) Ø discontinuous segments of text can be combined in a single annotation Ø independent parallel coders can produce independent annotations Ø different annotation files can contain different layers of information · Pianta & Bentivogli (2004) Ø elegance and clarity Ø processing conceptually simple Sekimo http: //www. text-technology. de/projects/sekimo. html
Drawbacks of stand-off annotation · new layers require a separate interpretation · the layers, although separate, depend on each other · the information, although included, is difficult to access using generic methods ® standard parsing or editing software cannot be employed · standard document grammars can only be used for the level, containing both markup and textual data · linking at a sub-element range is difficult · the primary layer should be a (primary) level Sekimo http: //www. text-technology. de/projects/sekimo. html
Non SGML-based Markup Languages · some non-SGML-based markup languages have been proposed, e. g. Multi-Element Code System (MECS) or Tex. MECS · its major extension with respect to SGML and XML is that overlapping ranges are admitted within documents. · in 2002 the Layered Markup and Annotation Language (LMNL) was proposed Tennison and Piez 2002 · LMNL is a markup language which not only allows to annotate overlapping elements but also to connect the element names to corresponding annotation levels. ® LMNL solves both problems, but · (full) LMNL is not SGML-based Sekimo http: //www. text-technology. de/projects/sekimo. html
- Slides: 8