Linkable data linked data and texts What have
Linkable data, linked data and texts What have the Digital Humanities to offer, based on the CIDOC-CRM and TEI. Christian-Emil Ore University of Oslo NTNU 02. 11. 2018
Agenda • • Introduction Text encoding Conceptual Modelling An example – Mediaeval texts and Linkable Data • Summing up
Virtual Research Environment – 1945 Mem. Ex (Memory Extention) with microfilm storage (Based on Vannavar Bush’s paper As We May Think, 1945) https: //www. youtube. com/watch? v=c 539 c. K 58 ees
Linking data – 1997 “Norwegian farm names” 139, Jaaberg. Pron: jåbber. References: - i Jabærghi RB. 31, 56. Jabergh DN II 657, 1471. Iaberg NRJ. IV 127. Jabere DN III 836, 1539. [. . . ] Diplomatarium Norvegicum Vol II p. 657 No. 882, Date 26 August 1471. Place: [Hyppestad] [. . . ] Jtem swor oc Stenulff Leidulfson sinsz fadhurs ordh at han gek med sin fadhur aff Jabergh som ligger i Sanda Hered deghi effther sancte Johannes dagh [. . . ] 23447. Grave find from Roman iron age from the stone circle at Jåberg (farmnr. 139) Sandar parish, Vestfold county. A) Bronze fibula from older Roman periode of the main type [. . . ] Archaeological acquisition catalogue
Agenda • • Introduction Text encoding Conceptual Modelling An example – Mediaeval texts and Linkable Data • Summing up
Text and ontology TEI XML • Physical and logical structure • Semantic content Henry III Fine Rolls Project (Ciula, Viera: “Complementing and extending TEI documents with an ontology”. TEI Members Meeting 2008) RDF/OWL ontology • Network of associations • Additional statements and interpretative layers <pers. Name key="ashford_de_william">William de <place. Name key="ashford 1">Ashford</place. Name> </pers. Name> <rs key="abjuration" type="subject">on the day he abjured the kingdom<pers. Name key="rumberue_de_thomas">Thomas de <place. Namekey="rumberue">Rumberue</place. Name></pers. Name></rs>
Encoding for extraction A fragment of a imaginary archaeological excavation report: “The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C 50435) into 30 pieces. ” December 2021 7
Information extraction Actor: Dr. Diggey Relation: performed Event: E 1 Type excavation Place: Wastland Time- span 2005 Actor: Dr. Diggey Relation: performed Event: E 2 Type: Modification Descr: Breaking the sword into 30 pieces Relation: part of E 1 Relation: in presence of Object: Sword Relation: identified by Identifier: C 50435 <TEI> <tei. Header> … </tei. Header> <text>… <p xml: id="p 1"> <rs xml: id="e 1">The excavation in <name type="place" xml: id="n 1">Wasteland </name> in <date xml: id="d 1">2005</date></rs> was performed by <name type="person" xml: id="n 2">Dr. Diggey </name>. He had the misfortune of <rs xml: id="e 2"> breaking <rs xml: id="o 1">the beautiful sword <rs xml: id=“o_id 1”>(C 50435)</rs> into 30 pieces</rs>. </p> … </text></TEI> 8
1 TEI integration routes TEI document Body Header <place>. . . <place> <event>. . . <event> <name>. . . </name> <rs>. . . <rs> <name>. . . </name> 3 TEI document 2 TEI document Header <rdf: >. . . <rdf: > December 2021 Body <name>. . . </name> <rs>. . . <rs> <name>. . . </name> Header <. . . > </. . . > Body <name>. . . </name> <rs>. . . <rs> <name>. . . </name> 9
Agenda • • Introduction Text encoding Conceptual Modelling An example – Mediaeval texts and Linkable Data • Summing up
The principle of Entropy Fallacy • Massive data aggregation: – Increased amount of data = Increase of amount of information – Increased interlinking = Increase in information – Popular view: Everything is connected to everything
Ontology • An ontology is a conceptual model, that is, a formally defined model resulting from an analysis of a specific domain • not necessarily a data model in the computer science sense. • Core ontologies with universals • General ontologies with particulars (thesauri/authority systems) • a formal ontology can be expressed
• • Eight basic concepts for data integration Events Person Place Time/Date Physical Objects Conceptual Objects Names Types
Event oriented analysis Eight basic concepts for data integration Objects Abstracts involved where Places identify Names Actors Events participate in when Time/Date characterize Types
CIDOC-CRM refer to / refine refer to / identifie E 41 Appellations E 55 Types E 39 Actors (persons, inst. ) participate in E 28 Conceptual Objects affect or refer to E 18 Physical Things E 2 Temporal Entities (Events) within E 52 Time-Spans have location at E 53 Places 15
CIDOC-CRM (http: //www. cidoc-crm. org) CIDOC Conceptual Reference Model (CRM) CRM Few concepts, high recall Event Thing Actor happened at CRMInf CRMSci MG eo PR ES So o Special concepts, high precision CR FR B L R Ro o Mo o was present at CRMArcheo CRMDig Acc. Martin Doerr
Agenda • • Introduction Text encoding Conceptual Modelling An example – Mediaeval texts and Linkable Data • Summing up
Charter by king Hákonsson 1225
Collection 1–Diplomatarium Norvegicum Summary Source info Text number Date Place Edited text
Collection 2 – recent transcripts
Collection 3 – Regesta Norvegica persons, places, subject, etc. are in the registries text witnesses where the charter text is published, e. g. in Diplomatarium Norvegicum
1 What is the original text? A more complex example After Apographa Arn. Magn. , presumably from a lost codex, Bergen (Barth. IV (E) 378 – 374) (Printed in Thork, Dipl. II 25) 22
2 What is the original text? … 23
3 What is the original text? Lost some time after 1311 27 July 1228, Perugia, original 9 May 1311, Bergen, vidimus (copy). To Copenhagen, ca 1670 - 1690 Transcribed in Bartholin’s Collectanea, 1690 27 July 1228 part printed DN II 1851 Lost in the great fire Copenhagen, 1728? Printed Thork. Dipl. II. 25. Copenhagen/Leipzig 1786 9 May 1311 frame (Vidimus) printed DN IX 1876 24
Norwegian Charters • Diplomatarium Norvegicum – 23 volumes, cover 1100 to 1582 – Published 1846 – 20011 – Retro-digitized, TEI P 5 encoding • Newer transcripts – Old Norwegian 1170 – 1405, 4000 transcripts – TEI P 5, no metadata, only identifier • Regesta Norvegica – 9 volumes cover 1100 to 1408 – Very rich in metadata – TEI P 5 encoded
Tools & Methods • Encoding the original texts as XML-documents – Text Encoding Initiative, tei-c. org – Medieval Nordic Text Archive, menota. org • Metadata expressed compliant with ontologies – Cultural heritage view: CIDOC-CRM (ISO-21127), – Library/bibliographic view: FRBR/LRM (FRBRoo/LRMoo) • Encoding of metadata – TEI-XML for presentation and archival purposes – RDF for linked data
Linked data – TEI-XML documents Part 1, the proper text Part 2, data for Linked Data (semantic web) <TEI. . . > <tei. Header> <file. Desc> <!--All kind of metadata--> <!-- Persons, places, bibl. ref, text witnesses etc --> </file. Desc> </tei. Header> <text> <! xml encode proper text goes here -->. . . </text> </TEI> Addtional structure with extracted assertions/metadata from the document expressed in RDF -XML
Possible points for external links • Regesta Norvegica/Diplomatarium Norvegicum – – – Persons, places, subject, onomastic information Creation date, place Text witnesses, archival signature, provenance Cross references for copies (vidimus) etc. Published, mentioned, bibliographic references • Transcripts – Text witnesses, archival signature – Linguistic information
Agenda • • Intro Text encoding Conceptual Modelling An example – Mediaeval texts and Linkable Data • Summing up
The well-known 5 stars of Linked Data 1. Data is available on the Web, in whatever format. 2. Available as machine-readable structured data, (i. e. , not a scanned image). 3. Available in a non-proprietary format, (i. e, CSV, not Microsoft Excel). 4. Published using open standards from the W 3 C (RDF and SPARQL). 5. All of the above and links to other Linked Open Data.
Two additional stars 6. The schemas (vocabularies/models) used in the dataset are explicitly described and published alongside the dataset, unless the schemas are already available somewhere on the Web. 7. The quality of the dataset against the RDF-schemas used in it must be explicated, so that the user can evaluate whether the data quality matches her needs. Hyvönen, E. , Tuominen, J. , Alonen, M. and Mäkelä, E. (2014)
Some conclusions I • The design of conceptual models is in itself a scholarly activity. It must be based on an stringent analysis of the scientific practice and source material of a given field. • Well defined ontologies may also act as a intellectual guide in the scholarly analysis of a source material • Without the use of common standard models like the CIDOC-CRM, data integration can only be done on a trivial level. December 2021 32
Some conclusions II • Combining existing ontologies uncritically may result in unintended connections • Ad hoc bottom up methods for data integration may be useful but must be complemented by top down methods provided by well-founded conceptual models 33
Thank you for your attention • Contact details: – Email: - c. e. s. ore@iln. uio. no • References: – CIDOC-CRM: cidoc-crm. org – TEI: tei-c. org – Current (old) versions of Dipl. Norv. , Norw. Farm Names, and Reg. Norv: www. dokpro. uio. no
35
- Slides: 35