UNIMARC in RDF Representation of UNIMARC Bibliographic Format
UNIMARC in RDF: Representation of UNIMARC Bibliographic Format in Resource Description Framework for Linked Data Gordon Dunsire, UK & Mirna Willer, Croatia IFLA World Library and Information Congress, 81 st IFLA General Conference and Assembly, Cape Town, 15 – 21 august 2015 Session 105 UNIMARC in RDF WORKSHOP
Overview • Introduction to linked data and UNIMARC • UNIMARC vocabularies • Future research and plans 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 2
Introduction to linked data and UNIMARC 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 3
Background • Representation of IFLA standards for use in the Semantic Web • Work of the FRBR Namespaces project and IFLA Namespaces Task Group • Work of the ISBD/XML Study Group • Included a feasibility study of representation of UNIMARC • Representations allow legacy catalogue records to be published as linked data using RDF • Branding IFLA standards for authority & trust • Semantic Web lets “Anyone say Anything about Any resource” 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 4
Linked data and RDF • Resource Description Framework (RDF) • Designed for machine-processing of metadata at global scale (Semantic Web) • 24/7/365 • Trillions of operations per second • Everything must be dis-ambiguated • Machines are dumb • A simple approach helps! • Machine-readable identifiers 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 5
RDF triple • Metadata expressed as “atomic” statements • A simple, single, irreducible statement • The title of this book is “Cataloguing is fun!” • Constructed in 3 parts • “Triple” • The title of this book is “Cataloguing is fun!” • Subject of the statement = Subject: This book • Nature of the statement = Predicate: has title • Value of the statement = Object: “Cataloguing is fun!” • This book – has title – “Cataloguing is fun!” • subject – predicate - object 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 6
Machine-readable identifiers • Uniform Resource Identifier (URI) • Can be any unique combination of numbers and letters • No intrinsic meaning; it’s just an identifier • RDF requires the subject and predicate of triple to be URIs • Object can be a URI, or a literal string (“Cataloguing is fun!”) • URIs can be matched by machine to link triples together 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 7
Vocabularies, values and element sets • Controlled terminology represented as RDF “value” vocabulary • Entities, attributes, and relationships represented as RDF “element set” vocabulary • Attributes and relationships represented as RDF properties (“predicates”) • Entities represented in RDF as classes • UNIMARC-B has only 1 entity: Resource • ISBD already has an equivalent class for Resource 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 8
Element sets • “Bibliographic” format has same focus as International Standard Bibliographic Description (ISBD) • The entity [bibliographic] Resource ~ FRBR Manifestation • Attributes => RDF properties • RDF properties require URIs • IFLA/UNIMARC URL domain + local unique UNIMARC part • Lossless data requires finest level of granularity • Important for UNIMARC qualified coded subfield 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 9
UNIMARC element and concept identifiers Element: Number (ISBN) Tag: 010 1 st ind. : b 2 nd ind. : b Subfield: a Unique in element set Coded Information Block: Target audience code 100 bba Character position: 17 -19 Unique in element set Target audience vocabulary: children, ages 9 -14 Code: d 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town Unique in vocabulary 10
tag tag. Cap 210 PUBLICATION, DISTRIBUTION, ETC. ind 1 Cap ind 2 Cap sub. Cap definition # Not applicable / # Produced in multiple a Place of The town or other locality Earliest available copies, usually Publication, where the item is published or publically Distribution, etc. distributed or, in the case of a distributed manuscript, written. 210 PUBLICATION, DISTRIBUTION, ETC. 0 Intervening publisher # Produced in multiple a copies, usually published or publically distributed Place of The town or other locality Publication, where the item is published or Distribution, etc. distributed or, in the case of a manuscript, written. 210 PUBLICATION, DISTRIBUTION, ETC. 1 Current or latest publisher # Produced in multiple a copies, usually published or publically distributed Place of The town or other locality Publication, where the item is published or Distribution, etc. distributed or, in the case of a manuscript, written. 210 PUBLICATION, DISTRIBUTION, ETC. # Not applicable / 1 Earliest available publisher Not published or publically distributed a Place of The town or other locality Publication, where the item is published or Distribution, etc. distributed or, in the case of a manuscript, written. 210 PUBLICATION, DISTRIBUTION, ETC. 0 Intervening publisher 1 Not published or publically distributed a Place of The town or other locality Publication, where the item is published or Distribution, etc. distributed or, in the case of a manuscript, written. 210 PUBLICATION, DISTRIBUTION, ETC. 1 Current or latest publisher 1 Not published or publically distributed a Place of The town or other locality Publication, where the item is published or Distribution, etc. distributed or, in the case of a manuscript, written. URI U 21011 a 14/09/2021 Label Place of publication … in Publication, distribution, etc. (Current or latest publisher) (Not published …) UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 11
Exception! Semantic data embedded in content 200 1#$a. Bibliographica belgica $f. Commission belge de bibliographie $f= Belgische Commissie voor bibliografie “= “ : Parallel U 2001_f : First Statement of Responsibility ? ? ? : Parallel First Statement of Responsibility 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 12
Translations • The same identifier is used for translated elements (captions, definitions, etc. ) and vocabularies (preferred terms, definitions, etc. ) • E. g. Vocabulary of 116 bba 0 = Coded data for graphics: Specific material designation 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 13
Graphics SMD translation example • Term identifier/URI: namespace/b • Notation: b • Preferred label (English): drawing • Preferred label (Italian): disegno • Preferred label (Portuguese): desenho • Definition (English): An original visual representation (other than a print or painting). . . 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 14
14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 15
14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 16
UNIMARC vocabularies 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 17
Value vocabularies • “thesauri, code lists, term lists, classification schemes, subject heading lists, …” • W 3 C Library Linked Data Incubator Group • Often represented in RDF using Simple Knowledge Organization System (SKOS) 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 18
Value vocabularies • Coded information stored in tag block 1 xx • Code lists specify notation, term, description, and scope • Represented as RDF/SKOS vocabularies • Italian and Portuguese translations – multilingual environment • Interoperability with vocabularies of other schema • 14 published so far • For example: Target audience 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 19
http: //metadataregistry. org/concept/list/vocabulary_id/322. html 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 20
URI design templates Element set granularity at subfield level with superstructure of fields (tags) and 2 qualifiers (indicators). Coded subfields refined by character position. Tag Ind 1 Ind 2 14/09/2021 Value vocabulary granularity at code level. Hash URIs used if code list is small, or self-referential (“other”, etc. ) Subfield Char. Pos URI 200 1 _ [blank] a 100 _ _ a 2001_a 17 Attribute Title proper 100__a 17 Target audience code 1 Vocabulary token Code URI Vocabulary: Term tac#m Target audience: adult, general UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 21
Target audience code “applicable to records of materials in any media“ Subfield a, character positions 17 -19, of tag 100 General processing data 3 instances of one-character code 100__a 17 -19 100__a 18 100__a 19 Order of position carries no significance in UNIMARC format 14/09/2021 But content rules may assignificance UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 22
isbdu: “has note on use or audience” Unconstrained versions isbd: “has note on use or audience” rdau: “Intended audience” rdfs: sub. Property. Of dct: “audience” rdaw: “Intended audience” m 21: “Target audience of …” 14/09/2021 Map of “Audience” Element sets (schema) schema: “audience” frbrer: “has intended audience” Value vocabularies (KOS) Broader/narrower/same? m 21: e MPAA: NC-17? “adult” pbcore: adult umarc: m umarc: k UNIMARC in RDF: Workshop, IFLA 2015, Cape Town BBFC: 18? “adult” “adult, general” “adult, serious” 23
110 (CODED DATA FIELD: CONTINUING RESOURCES) $a (Continuing Resource Coded Data) Attribute Character position Type designator 0 Frequency of issue l Regularity 2 U 110__a 0 Value Notes c a a newspaper daily regular U 110__a 1 U 110__a 2 Property URI = Subfield URI + Character position 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 24
“daily”@en unimarcb: U 110__a 0 resource: 123 unimarcb: U 110__a 1 unimarcb: U 110__a 2 “giornaliera”@it crtype: c “diária”@pt freq: a skos: pref. Label reg: a skos: notation “a” Frequency map for Dublin Core, MARC 21, and RDA 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 25
Future research and plans 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 26
Level 0: the finest level of granularity • Subfield qualified by indicators • “A defined unit of information within a field. See also Data Element” • “The smallest unit of information that is explicitly identified” • Field: “A defined character string, identified by a tag, which contains one or more subfields” • Coarser level of granularity (Level 1+) with structure of combinations of Level 0 elements • Indicator qualification is at field level, and redundant for Level 0 elements that are not in scope. 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 27
U 21011 a Place of publication … in Publication, distribution, etc. (Current or latest publisher) (Not published …) U 210_1 a Place of publication … in Publication, distribution, etc. (Not applicable …) (Not published …) U 21001 a Place of publication … in Publication, distribution, etc. (Intervening publisher) (Not published …) U 2101_a Place of publication … in Publication, distribution, etc. (Current or latest publisher) (Produced in multiple copies …) 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 28
Publication … u: 210 is aggregated by is sub-property of Place … u: 210__a 14/09/2021 Place … u: 210 a Place … u: 2100_a Place … u: 2101_a UNIMARC in RDF: Workshop, IFLA 2015, Cape Town Place … u: 210 XXa 29
Publication … Statement 1 Place 1 14/09/2021 Place 2 Publication … Statement 2 Place 3 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town Place 4 30
Representing UNIMARC authorities in RDF 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 31
Representing UNIMARC authorities in RDF: use of parallel vocabularies 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 32
Representing UNIMARC authorities in RDF: authorised and variant forms of a name 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 33
Mappings • UNIMARC tags and subfields have corresponding ISBD “elements” • Now out-of-date after publication of ISBD consolidated edition • Category of alignment relationship to be determined • Equivalent or broader/narrower • To be used as basis for sub-property mappings • Mappings from UNIMARC to other vocabularies being developed 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 34
UNIMARC and ISBD properties • Element identifier/URI: unimarcb: P 205 bbb • Label (English): (has) issue statement • Equivalent ISBD URI: isbd: P 1011 • Label (English): has additional edition statement • The meaning is the same, but the identifiers and labels are different • unimarcb: P 205 bbb same as isbd: P 1011 (in RDF) • Or use isbd: P 1011 instead of unimarcb: P 205 bbb 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 35
UNIMARC Alignment with ISBD UNIMARC Property Label U 200__a Title proper ISBD A Property Label = P 1004 has title proper <> P 1117 has title of individual work by same author P 1137 has common title of title proper Alignment is equal, broader, and narrower! 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 36
UNIMARC and MARC 21 (BIBFRAME) • UNIMARC Level 0 approach is based on publication of MARC 21 element sets in the Open Metadata Registry • BIBFRAME has a coarser granularity, but is extensible • Sub-properties and sub-classes can be added to refine the semantics • BF is lossy at current levels of granularity • UNIMARC separates content (values) from structure (encoding) in most cases • = Parallel is an exception • BF model is based on data in legacy records • Extensive “archaeology” required to trace semantics and syntax. 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 37
DCT audience UM UM Target audience code char. Pos 2 char. Pos 3 char. Pos 1 M 21 coded. Type a 14/09/2021 … M 21 Target audience coded. Type c coded. Type d M 21 coded. Type t UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 38
Granularity • Intellectual value of UNIMARC is preserved by a finest-grained semantic representation • Data can always be dumbed-down to the level of coarseness required by applications • Processed with shared open maps • Including schema. org and dct! • And BIBFRAME too … • Data should be published without loss • For semantically rich applications • Universal Bibliographic Control ~ Semantic Web 14/09/2021 UNIMARC in RDF: Workshop, IFLA 2015, Cape Town 39
References • Dunsire, Gordon; Mirna Willer. UNIMARC and Linked Data. // IFLA Journal 37, 4(December 2011), 314 -326, http: //www. ifla. org/files/hq/publications/iflajournal/ifla-journal-37 -4_2011. pdf • Dunsire, G. Using the sub-property ladder, [blog] 2012, http: //managemetadata. com/blog/2012/05/12/using-the-sub-property-ladder/ • Hillmann, D. , G. Dunsire, J. Phipps. Maps and Gaps: Strategies for Vocabulary Design and Development. In Proc. Int’l Conf. on Dublin Core and Metadata Applications 2013, 82 -89, http: //dcevents. dublincore. org/Int. Conf/dc 2013/paper/view/185/80; • Willer, M. , G. Dunsire. Bibliographic information organization in the Semantic Web. Oxford: Chandos, 2013. 14/09/2021 UNIMARC in RDF: Workshop, IFLA 40
Thank you! 14/09/2021 UNIMARC in RDF: Workshop, IFLA 41
- Slides: 41