Managing Provenance Versioning for an Evolving Dictionary in
- Slides: 24
Managing Provenance & Versioning for an (Evolving) Dictionary in Linked Data Format Frances Gillis-Webber MPhil Student, Library and Information Studies Centre, University of Cape Town LDL-2018 Workshop @ LREC 2018, Miyazaki, Japan, 12/05/2018
English-Xhosa Dictionary for Nurses A Bilingual Dictionary of Medical Terms
About the dictionary ● Published in 1935 ● Compiled by Neil Mac. Vicar, in conjunction with isi. Xhosa-speaking nurses ● In the public domain, free from any copyright restriction
isi. Xhosa / English Isi. Xhosa: Nguni language group [1], 16% as L 1 [2] English: 9. 6% as L 1 [2] an ex-colonial language lingua franca with high status [3] [1] Doke, 1954; “Subfamily: Nguni(S. 40), n. d [2] 2011 Census, Statistics South Africa [3] Ngcobo, 2010
Digitising the dictionary Three requirements were identified when digitising: 1. It must be human- and machine-readable 2. It did not have to remain an exact replica of the printed artefact 3. It must be encoded in a way which would allow it to “evolve”
Managing change 1. Versioning becomes important, particularly if the LR is integrated into another LR 2. Recording provenance information for each change becomes important 3. The URI strategy should allow for versioning
The URI Strategy Use Cases, Fragment Identifiers, the URI Pattern, and Resource Identifiers
The URI use cases U 1: A URI which identifies the resource U 2: A URI which identifies a sub-resource in relation to the parent resource U 3: A URI which identifies a version of the resource U 4: A URI which identifies a version combined with a sub-resource U 5: A URI which identifies a document describing the resource in U 1 U 6: A URI which identifies a document describing the resource in U 3
Fragment identifiers A fragment identifier is of the pattern: http: //example. com/my-uri#something Widely used in vocabularies, where “the vocabulary is often served as a document and the fragment is used to address a particular term within that document” [1] Shows a hierarchical relationship with the parent resource
The URI pattern ● The URI pattern recommended by Archer et al. (2012), and Gracia and Vila. Suero were evaluated: http: //{domain}/{type}/{concept}/{reference} E 1: http: //linguistic. linkeddata. es/id/apertium/lexicon. EN/bench-n-en ● But ultimately a simplified version was adopted: E 1 revised: http: //linguistic. linkeddata. es/entry/bench-n-en
U 1: A URI which identifies the resource Form: {http(s): }//{Base URI}/{Resource Path}/{Resource ID} Where: ● ● {http(s): } is the http or https scheme {Base URI} is the host {Resource Path}, for eg. entry for a lexical entry, lexicon for a lexicon {Resource ID}, for eg. en-n-abdomen Example: https: //londisizwe. org/entry/en-n-abdomen
U 2: A URI which identifies a sub-resource Form: {http(s): }//{Base URI}/{Resource Path}/{Resource ID}#{Fragment ID} Where: ● {Fragment ID} is the fragment identifier, for eg. sense 1 Example: https: //londisizwe. org/entry/en-n-abdomen#sense 1
U 3: A URI which identifies a version of the resource Form: {http(s): }//{Base URI}/{Resource Path}/{Resource ID}/{Version ID} Where: ● {Version ID} is the version identifier, for eg. 2017 -09 -19 Example: https: //londisizwe. org/entry/en-n-abdomen/2017 -09 -19
U 4: A URI which identifies a version combined with a sub-resource Form: {http(s): }//{Base URI}/{Resource Path}/{Resource ID}/{Version ID}#{Fragment ID} Example: https: //londisizwe. org/entry/en-n-abdomen/2017 -09 -19#sense 1
U 5 & U 6: A URI which identifies a document of the resource U 5: identifies a document describing the resource in U 1 {http(s): }//{Base URI}/{Document}/{Resource Path}/{Resource ID} U 6: identifies a document describing the resource in U 3 {http(s): }//{Base URI}/{Document}/{Resource Path}/{Resource ID}/{Version ID} Where: ● {Document} refers to the HTML page, for eg. page, or to the RDF representation, for eg. rdf
Resource identifiers can take two forms, both used here: ● Descriptive - for modelling the lexical entries and lexicons An adaptation of E 1, the resource identifier is of the form: {Language Code}-{POS}-{Lemma} ● Opaque - for modelling the lexical concepts Using a similar approach to Babelnet, for eg. 00001
Modelling Provenance & Versioning For Lexical Entries, Senses & Lexicons
Versioning The following components have been identified for versioning: ● Versioned URIs for lexicons, lexical entries, and senses (and lexical concepts) ● Provenance metadata to describe the versions, with the latest version showing the previous versions ● The generation of files, one for each version of the lexical entries and lexicons.
Provenance The W 3 C has defined provenance as: “as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing” [1] “PROV-O”, 2013
Modelling provenance for a lexical entry : entry/xh-n-isisu a ontolex: Lexical. Entry , ontolex: Word , prov: Entity ; lexinfo: part. Of. Speech lexinfo: Noun ; dct: language <http: //id. loc. gov/vocabulary/iso 639 -2/xho> , <http: //lexvo. org/id/iso 639 -1/xh> ; dct: identifier : entry/xh-n-isisu ; : entry/xh-n-isisu#sense 2 rdfs: label "isisu"@xh ; a ontolex: Lexical. Sense , prov: Entity ; ontolex: canonical. Form : entry/xh-n-isisu#lemma ; ontolex: is. Lexicalized. Sense. Of : concept/00007 ; ontolex: sense : entry/xh-n-isisu#sense 1 , : entry/xh-n-isisu#sense 2 ; dct: identifier dct: subject mesh: D 000005 ; : entry/xh-n-isisu#sense 2 ; dct: is. Part. Of : entry/xh-n-isisu ; ontolex: denotes dbr: Abdomen , dbr: Stomach ; dct: creator <https: //londisizwe. org> ; ontolex: evokes : concept/00001 ; prov: generated. At. Time "2018 -01 -10 T 05: 00 Z|+02: 00"^^xsd: date. Time ; dct: is. Part. Of : lexicon/xh ; owl: version. Info "2018 -01 -10"^^xsd: string ; dct: license <http: //creativecommons. org/publicdomain/mark/1. 0/> ; owl: same. As : entry/xh-n-isisu/2018 -01 -10#sense 2 ; prov: had. Primary. Source "The English-Xhosa Dictionary for Nurses"@en ; owl: has. Version <https: //londisizwe. org> : entry/xh-n-isisu/2018 -01 -10#sense 2. dct: creator ; prov: generated. At. Time dct: modified owl: version. Info owl: same. As owl: has. Version "2018 -01 -10 T 05: 00 Z|+02: 00"^^xsd: date. Time ; "2018 -01 -10"^^xsd: date ; "2018 -01 -10"^^xsd: string ; : entry/xh-n-isisu/2018 -01 -10 ; : entry/xh-n-isisu/2017 -09 -19 , : entry/xh-n-isisu/2018 -01 -10.
Modelling provenance for a lexicon : lexicon/xh a lime: language dct: identifier : lexicon/xh/2018 -01 -12 lime: lexical. Entries a lime: linguistic. Catalog lime: Lexicon , void: Dataset , prov: Entity , prov: Dictionary , prov: Collection ; "xh" ; <http: //id. loc. gov/vocabulary/iso 639 -2/xho> , <http: //lexvo. org/id/iso 639 -1/xh> ; : lexicon/xh ; "1"^^xsd: integer ; prov: Dictionary. ; <http: //www. lexinfo. net/ontologies/2. 0/lexinfo> dct: description "Londisizwe. org - isi. Xhosa lexicon"@en ; : lexicon/xh/2018 -01 -15 dct: creator <https: //londisizwe. org> ; a prov: Dictionary ; prov: generated. At. Time "2018 -01 -15 T 06: 00 Z|+02: 00"^^xsd: date. Time ; prov: derived. By. Removal. From dct: modified "2018 -01 -15"^^xsd: date ; : lexicon/xh/2018 -01 -12 ; prov: qualified. Removal [ owl: version. Info "2018 -01 -15"^^xsd: string ; a owl: same. As : lexicon/xh/2018 -01 -15 ; prov: Removal ; prov: dictionary, : lexicon/xh/2018 -01 -12 ; owl: has. Version : lexicon/xh/2017 -09 -19 , : lexicon/xh/2018 -01 -15 ; prov: removed. Key "xh-n-ulusu_lomntu"^^xsd: string ; dct: references : lexicon/en ; ]; void: data. Dump <https: //londisizwe. org/data/xh-lexicon/2018 -01 -15>. .
Future Work
londisizwe. org 2018: ● To continue with the human-readable view ● To continue publishing the lexical entries derived from the original dataset ● To add SASL as another language, changing the resource from bilingual to multilingual 2019: ● To continue working with the lexical concepts, using machine translation, and crowdsourcing and gamification techniques to evolve the resource further
Thank you!
- Netflix api versioning
- Versioning price discrimination
- Concurrent version system
- Concurrent versioning system
- Fhir provenance example
- Aami tir 45
- What is provenance?
- "provenance properties"
- Provenance semirings
- A framework for clustering evolving data streams
- Key evolving signature
- Evolving
- Evolving design
- Strategi för svensk viltförvaltning
- Sura för anatom
- Egg för emanuel
- Varians formel
- Tack för att ni har lyssnat
- Rutin för avvikelsehantering
- Läkarutlåtande för livränta
- Kontinuitetshantering i praktiken
- Treserva lathund
- Myndigheten för delaktighet
- Att skriva debattartikel
- Tack för att ni lyssnade