Exploring the boundaries of MARC 21 creating a
Exploring the boundaries of MARC 21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin @pherterich – patricia. herterich@cern. ch ORCID: 0000 -0002 -4542 -9906 ELAG 2015, Stockholm
CERN and High-Energy Physics Copyright: CERN LHC: 27 km, 4 detectors 10‘ 000 scientists & engineers from 100 countries Higgs boson discovery in July 2012
Research Data in HEP
Data sharing in HEP ¡Data policies available for the 4 LHC experiments: http: //opendata. cern. ch/collection/data-policies ¡Data supplementary to publications available through the repository HEPdata ¡More and more supplementary data integrated into the main information system for HEP, INSPIRE
INSPIRE MARC 21 ¡Most metadata derived from and linked to publication
The CERN Open Data Portal http: //opendata. cern. ch/
The CERN Open Data Portal ¡Public access point to data (including software and documentation) produced at CERN ¡Launched in November 2014 ¡Access to 27 TB of CMS data + educational data from all 4 LHC experiments ¡Datasets get minted DOIs ¡Based on Invenio, a digital library software developed at CERN
The metadata challenge
The metadata challenge ¡Metadata input for the portal is through MARCXML ¡MARC 21 had to be extended to host the necessary metadata ¡ Broad interpretation of some fields ¡ Creation of new customised fields
Current implementation https: //github. com/cernopendata/opendata. cern. ch/tree/production/invenio_opendata/testsuite/data
Going beyond MARC 21 And extending the scope of research data management in HEP
The CERN Analysis Preservation system ¡A closed system to preserve data and associated objects and information to allow reproducibility of an analysis ¡Discovery tool for data and analyses ¡Integration of structured analysis preservation info into publication approval workflows
Current prototype
More metadata challenges… ¡Even more complex metadata than for the CERN Open Data Portal, thus MARC 21 is not an option ¡Invenio uses JSON internally ¡JSON is used by most of the collaborations’ databases ¡Chance to create a standard data model for the HEP community to facilitate data and information exchange ¡Can be extended to JSON-LD
https: //drive. google. com/file/d/0 B 9 f. GRYX 4 RNa. SWkt. UTj. Z 0 Wlp. YZz. Q/view? usp=sharing
An ontology for HEP data analysis? ¡Kick-Off workshop in May 2015 ¡Collaboration with DASPOS (Data and Software Preservation for Open Science, Notre Dame University, Indiana) and Data Semantics Lab (Wright State University, Ohio) ¡Work will continue throughout the year to complete the modelling and formalise the ontologies for implementation
Example: Detector Final State
Next steps ¡Model mindmap in graphs & formalise ontologies ¡Have an improved prototype of the CERN Analysis Preservation system ready by the end of the summer
Acknowledgements CERN IT J. Cowton, P. Fokianos, J. Kunčar, T. Smith, T. Šimko CERN SIS S. Dallmeier-Tiessen, L. Rueda, S. Mele ALICE M. Gheata, C. Grigoras ATLAS K. Cranmer, L. Heinrich, D. Rousseau, F. Socher CMS A. Calderon, A. Huffman, K. Lassila-Perini, T. Mc. Cauley, A. Rao, A. Rodriguez Marrero LHCb S. Amerio, B. Couturier, A. Trisovic CERN Cern. VM J. Blomer CERN EOS L. Mascetti DASPOS M. Hildreth, C. Vardeman DPHEP F. Berghaus, J. Shiers All the participants of the Vo. Camp @Notre Dame University in May 2015 Work sponsored by the Wolfgang Gentner Programme of the Federal Ministry of Education and Research
- Slides: 20