Managing and publishing data Jonathan Rans Digital Curation
Managing and publishing data Jonathan Rans Digital Curation Centre, Edinburgh J. Rans@ed. ac. uk Twitter: @JNRans Introduction to Open Science and developing RDM services, Riga, Latvia
What will we cover? 1. Introduction and definitions 2. How to manage data 3. What to publish and where to put it
Definitions
Definition of research data Pilot focuses on research data specifically ‘Research data’ refers to information, in particular facts or numbers, collected to be examined and considered as a basis for reasoning, discussion or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form. Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 v. 2. 1, 15 February 2016, p 3
So, what might this include? http: //www. aoml. noaa. gov/phod/dac/array_growth. html http: //www. sbirc. ed. ac. uk/documents/lbc_protocol. pdf http: //www. aoml. noaa. gov/phod/graphics/dacdata/globpop. gif
What is research data management? Plan Discover and Reuse Create Deposit and Publish Use Appraise “an explicit process covering the creation and stewardship of research materials to enable their use for as long as they retain value. ” Data management is part of good research practice
How to manage and share data
The Research Data Lifecycle Data Management Planning Data creation Annotating / documenting data Analysis, use, versioning Discover and Reuse Storage and backup Publishing papers and data Preparing for deposit Deposit and Archiving and sharing Publish Licensing Citing… Plan Create Use Appraise
What is metadata? Data about data • Citation • Discovery • Reuse
What is the difference? Metadata • Standardised • Structured • Machine and human readable Documentation Metadata
What is the minimum required? Data. Cite metadata used by Open. AIRE Citation/disambiguation • Identifier e. g. DOI • Creator • Title • Publisher • Publication Year Licencing/access conditions
What are persistent identifiers? They are an alphanumeric code identifying a resource, organisation or individual They must be • Unique • Persistent Ideally they should be actionable too
How do persistent identifiers work Taken from the DCC guide: How to Cite Datasets and Link to Publications http: //www. dcc. ac. uk/resources/how-guides/cite-datasets
Disciplinary metadata standards to use Use relevant standards for interoperability www. dcc. ac. uk/resources/metadata-standards
Some formats are better for long-term It’s preferable to opt formats that are: • Uncompressed • Non-proprietary • Open, documented • Standard representation (ASCII, Unicode) Data centres may have preferred formats for deposit e. g. Type Recommended Non-preferred Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP 4, Ogg Codec: Theora, Dirac, FLAC Quicktime H 264 Images TIFF, JPEG 2000, PNG GIF, JPG Structured data XML, RDF RDBMS Further examples: http: //www. data-archive. ac. uk/create-manage/formats-table
Licensing research data openly This DCC guide outlines the pros and cons of each approach and gives practical advice on how to implement your licence CREATIVE COMMONS LIMITATIONS Horizon 2020 Open Access guidelines point to: or NC Non-Commercial What counts as commercial? ND No Derivatives Severely restricts use These clauses are not open licenses www. dcc. ac. uk/resources/how-guides/license-research-data
EUDAT licensing tool Answer questions to determine which licence(s) are appropriate to use http: //ufal. github. io/lindat-license-selector
Options for closed data Institutional data archive/vault Safe havens – (e. g. secure patient data) 3 rd party data archiving Cloud storage Institutional servers – the ‘do nothing’ option
Options for open data Domain repository General repository – Figshare, Zenodo, Dryad Institutional repository Journal supplementary material Departmental web page
Go Ø Finding external repositories General directories Re 3 data. org Ø Domain specific directories e. g. life sciences – Biosharing. org Ø Data journal recommendations Edinburgh research data blog: Sources of dataset peer review Ø Funding body recommendations E. g. Wellcome Trust Data repositories and database sources
A conversation with the researcher There may be an accepted repository used by peers or required by funders Multidisciplinary studies may not have an obvious home Data types and volumes will impact on decision
Data repositories • Does your publisher or funder suggest a repository? • Are there data centres or community databases for your discipline? • Does your university offer support for long-term preservation? Zenodo • Open. AIRE-CERN joint effort • Multidisciplinary repository • Multiple data types – Publications – Long tail of research data • Citable data (DOI) • Links funding, publications, data & software http: //databib. org http: //service. re 3 data. org/search www. zenodo. org
Any Questions? Jonathan Rans J. Rans@ed. ac. uk @JNRans Image Credits Research Dictionary: http: //www. saquiresearch. com Himalayas: www. cntraveller. com
- Slides: 23