Metadata Aggregation Webinar Dimitris Gavrilis Digital Curation Unit
Metadata Aggregation Webinar Dimitris Gavrilis Digital Curation Unit – IMIS, Athena Research Center
About metadata aggregation What is metadata aggregation ? Metadata aggregation is a process where metadata records from different sources and in different formats are aggregated into 1 common format Where is it used ? In most projects and process that include generating / collecting data Science, culture, education, …
Required knowledge - Key concepts This presentation is an introductory course More advanced sessions on specialized topics will follow Key concepts that would help the audience: Digital repository Digital record Metadata schema XML RDF
A typical case Repository 1 Record 1 Format 1 Repository 2 Record 2 Format 2 Repository N Record N Format N Aggregato r Record 1 Format X Repository X Record 2 Format X Record 3 Format X
Extensible Markup Language In computing, Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both humanreadable and machine-readable. The W 3 C's XML 1. 0 Specification and several other related specifications, —all of them free open standards—define XML. Source: Wikipedia
Metadata schemas Metadata schemes (also called schema) are sets of metadata elements designed for a specific purpose, such as describing a particular type of information resource. The definition or meaning of the elements themselves is known as the semantics of the scheme. Source: NISO
<oai_dc: dc xmlns: oai_dc="http: //www. openarchives. org/OAI/2. 0/oai_dc/" xmlns: dc="http: //purl. org/dc/elements/1. 1 /" xmlns: xsi="http: //www. w 3. org/2001/XMLSchemainstance" xsi: schema. Location="http: //www. openarchives. org/OAI/2. 0/oai_dc/ http: //www. openarchives. org/OAI/2. 0/oai_dc. xsd"> <dc: title>Θεατρική Παράσταση με θέμα την επανάσταση του 1821</dc: title> <dc: creator>Kyperounda Community</dc: creator> <dc: creator>Κοινότητα Κυπερούντας</dc: creator> <dc: subject>Κυπερούντα (Λεμεσός--Κύπρος)--Ιστορία</dc: subject> <dc: subject>Kyperounda (Limassol--Cyprus)--History</dc: subject> <dc: subject>Theater--Cyprus--Kyperounda--History</dc: subject> <dc: subject>Θέατρο--Κύπρος--Κυπερούντα--Ιστορία</dc: subject> <dc: description> Θεατρική Παράσταση με θέμα την επανάσταση του 1821. Ο δάσκαλος που τους διδαξε το θέατρο ήταν ο Ιωάννης Αβραμίδης. Όρθιοι: Αβραμίδης Χρ. Ιω. Παπαβασιλείου, Γ. Χριστοφόρου, Θεοδόσης Ι. Παπαβασιλείου, Ανδρονίκη Σταυρή, Ανδρούλα Στ. Χειμωνίδη (Πάμπη); , Ιω. Γρηγορίου, Χριστ. Σάββα Χρυσοστόμου (Ππολής), Ανδρέας Σ. Ππαλάζη (Ζαφτιές) Καθήμενοι: Γιώργος Παπαστυλιανού, παιδί δασκάλου, Κ. Χ’’Φαίδωνος, Νίκος Χριστ. Μόρνης, Α. Νεάρχου. </dc: description> <dc: publisher>Library of Cyprus University of Technology</dc: publisher> <dc: publisher> Digital Heritage Research Lab of Cyprus University of Technology </dc: publisher> <dc: contributor>Kyperounda Community</dc: contributor> <dc: contributor>Κοινότητα Κυπερούντας</dc: contributor> <dc: date>Circa 1962</dc: date> <dc: date>Γύρω στο 1962</dc: date> <dc: type>Image</dc: type> <dc: format>JPG</dc: format> <dc: identifier>KCPD (4)</dc: identifier> <dc: identifier>https: //apsida. cut. ac. cy/items/show/16997</dc: identifier> <dc: identifier> https: //apsida. cut. ac. cy/files/original/d 6 d 61 d 885 ae 13020 cd 07139 d 6 b 219020. jpg </dc: identifier> <dc: source>Kyperounda Community</dc: source> <dc: source>Κοινότητα Κυπερούντας</dc: source> <dc: language>EL, EN</dc: language> <dc: coverage>34. 938019, 32. 976517</dc: coverage> <dc: rights> Απαγορεύεται η δημοσίευση ή αναπαραγωγή, ηλεκτρονική ή άλλη χωρίς τη γραπτή συγκατάθεση του δημιουργού. </dc: rights> </oai_dc: dc>
<rdf: RDF xmlns: oai_dc="http: //www. openarchives. org/OAI/2. 0/oai_dc/" xmlns: dc="http: //purl. org/dc/elements/1. 1/" xmlns: edm="http: //www. europeana. eu/sche mas/edm/" xmlns: dcterms="http: //purl. org/dc/terms/" xmlns: ore="http: //www. openarchives. org/ore/terms/"xmlns: rdf="http: //www. w 3. org/1999/02/22 -rdfsyntax-ns#" xmlns: wgs 84_pos="http: //www. w 3. org/2003/01/geo/wgs 84_pos#" xmlns: xsi="http: //www. w 3. org/2001/XMLSchemainstance" xmlns: skos="http: //www. w 3. org/2004/02/skos/core#"> <edm: Provided. CHO rdf: about="http: //more. locloud. eu/object/CUT/19941305"> <dc: contributor xml: lang="">Kyperounda Community</dc: contributor> <dc: contributor xml: lang="">Κοινότητα Κυπερούντας</dc: contributor> <dc: creator xml: lang="">Kyperounda Community</dc: creator> <dc: creator xml: lang="">Κοινότητα Κυπερούντας</dc: creator> <dc: date>Circa 1962</dc: date> <dc: date>Γύρω στο 1962</dc: date> <dc: description xml: lang=""> Θεατρική Παράσταση με θέμα την επανάσταση του 1821. Ο δάσκαλος που τους διδαξε το θέατρο ήταν ο Ιωάννης Αβραμίδης. Όρθιοι: Αβραμίδης Χρ. Ιω. Παπαβασιλείου, Γ. Χριστοφόρου, Θεοδόσης Ι. Παπαβασιλείου, Ανδρονίκη Σταυρή, Ανδρούλα Στ. Χειμωνίδη (Πάμπη); , Ιω. Γρηγορίου, Χριστ. Σάββα Χρυσοστόμου (Ππολής), Ανδρέας Σ. Ππαλάζη (Ζαφτιές) Καθήμενοι: Γιώργος Παπαστυλιανού, παιδί δασκάλου, Κ. Χ’’Φαίδωνος, Νίκος Χριστ. Μόρνης, Α. Νεάρχου. </dc: description> <dc: identifier>KCPD (4)</dc: identifier> <dc: publisher xml: lang="">Library of Cyprus University of Technology</dc: publisher> <dc: publisher xml: lang="">Digital Heritage Research Lab of Cyprus University of Technology</dc: publisher> <dc: source>Kyperounda Community</dc: source> <dc: source>Κοινότητα Κυπερούντας</dc: source> <dc: subject>Κυπερούντα (Λεμεσός--Κύπρος)--Ιστορία</dc: subject> <dc: subject>Kyperounda (Limassol--Cyprus)--History</dc: subject> <dc: subject>Theater--Cyprus--Kyperounda--History</dc: subject> <dc: subject>Θέατρο--Κύπρος--Κυπερούντα--Ιστορία</dc: subject> <dc: title>Θεατρική Παράσταση με θέμα την επανάσταση του 1821</dc: title> <dcterms: spatial rdf: resource="http: //more. locloud. eu/object/CUT/19941305/SP. 2"/> <edm: type>IMAGE</edm: type> </edm: Provided. CHO> <edm: Web. Resource rdf: about="https: //apsida. cut. ac. cy/files/original/d 6 d 61 d 885 ae 13020 cd 07139 d 6 b 219020. jpg"> <dc: format>JPG</dc: format> <dc: rights xml: lang=""> Απαγορεύεται η δημοσίευση ή αναπαραγωγή, ηλεκτρονική ή άλλη χωρίς τη γραπτή συγκατάθεση του δημιουργού. </dc: rights> <edm: rights rdf: resource="http: //www. europeana. eu/rights/rr-f/"/> </edm: Web. Resource> <edm: Place rdf: about="http: //more. locloud. eu/object/CUT/19941305/SP. 2"> <skos: pref. Label>34. 938019, 32. 976517</skos: pref. Label> </edm: Place> <ore: Aggregation rdf: about="http: //more. locloud. eu/object/CUT/19941305#aggregation"> <edm: aggregated. CHO rdf: resource="http: //more. locloud. eu/object/CUT/19941305"/> <edm: data. Provider>Library of Cyprus University of Technology</edm: data. Provider> <edm: is. Shown. At rdf: resource="https: //apsida. cut. ac. cy/items/show/16997"/> <edm: is. Shown. By rdf: resource="https: //apsida. cut. ac. cy/files/original/d 6 d 61 d 885 ae 13020 cd 07139 d 6 b 219020. jpg"/> <edm: object rdf: resource="https: //apsida. cut. ac. cy/files/original/d 6 d 61 d 885 ae 13020 cd 07139 d 6 b 219020. jpg"/> <edm: provider>Lo. Cloud</edm: provider> <edm: rights rdf: resource="http: //www. europeana. eu/rights/rr-f/"/> </ore: Aggregation> </rdf: RDF>
How can I convert from one schema to another ?
XSLT (Extensible Stylesheet Language Transformations) is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, Post. Script and PNG. Source: Wikipedia
Aggregation frameworks & workflows Metadata aggregation is all about scale & complexity Large amounts of data Too many formats Too many mapping variations Time & resources constraints
Metadata Aggregators In real world problems systems that can handle the above complexity are employed Usually refereed to as metadata aggregators Let’s see a real case MORe (Metadata & Object Repository) http: //more. dcu. gr/ 3/5/2021 16
MORe architecture … Thesauri collections … Vocabulary matching File-Upload … Background links … OAI-PMH Structure Geo normalization Archive MINT mapping tool Wikimedia Schema Geo coding Elastic Search Linking RDF Store Omeka Schematron rules Language identification Historic place names Validation microservices Validation service Input service mgmt Core services layermgmt Input sources Data access layer OAI-PMH Enrichment microservices Publish services Enrichment service mgmt Publish serv. mgmt
MORe workflow Validate OAI-PMH Index Archive Delete RDF Store Reject Elastic Search OAI-PMH Omeka Ingest Publish Transform Enrich Harvest Wikimedia MINT
Anatomy of a record Item Administrative Metadata Enriched Native metadata Technical Metadata Target metadata Preservation Metadata Enriched Target metadata Native metadata Versionable datastream
Object during an aggregation process Harvest NATIVE SCHEMA COMMON SCHEMA Transformation XSLT ENRICHED SCHEMA XSLT Enrich Plan Enrichment #2
What about quality? Missing information Loss of information due to the mapping process Conceptually wrong mappings
Quality metrics 1. Completeness (objective) • • Indicates the percentage of completion of the elements of a schema Can be split into quanta: completeness of madatory element set completeness of recommended element set 2. Accuracy (objective & subjective) • • Measures the accuracy of the provided information Example: Coordinates in spatial EG, SKOS URIs in thematic, etc
Quality metrics 3. Consistency (objective & subjective) • Indicates whether metadata values are consistent with the acceptable types of the metadata elements as described by the schema. 4. Appropriateness (subjective) • Indicates whether the values provided are appropriate for the targeted use. 5. Auditability (objective) • Indicates whether the record can be tracked back to its original form
How do I improve quality ?
Metadata Enrichment Thematic Thesauri collections Vocabulary matching Background links SKOS Thesauri DBPedia Spatial Geo normalization Geo coding Reverse geo-coding Historic place names Other Language identification Wikipedia Geo-Names source: http: //more. dcu. gr/
Enrichment (micro)-services source: http: //more. dcu. gr/
Enrichment plans Enrichment micro-services are used within enrichment workflows: Enrichment plans Each enrichment plan applies to a specific schema Enrichment Plan Language identification Vocabulary matching Geo-normalization Each enrichment plan executes enrichment microservices in a specific order Geo-coding
Let’s see an example Example taken from the Lo. Cloud FP 7 project http: //www. locloud. eu/ Goal: Aggregate cultural heritage related content from small-medium providers, enrich it an publish it to Europeana
Original record
Transformed record Missing language attributes Place label is a concat string of coordinates
Enriched record Enrichment Plan Language identification Vocabulary matching Geo-normalization Geo-coding
Quality improvement Completeness has been improved Accuracy has been improved
Questions
- Slides: 33