INTRODUCTION TO PERSISTENT IDENTIFIERS Helis Academy Christine Staiger
























- Slides: 24

INTRODUCTION TO PERSISTENT IDENTIFIERS Helis Academy, Christine Staiger, DTL Nov 4 th 2019

AGENDA DAY 3 9: 00 – 9: 30 am 9: 30 – 11: 00 am 11: 00 – 11: 30 am Arrival and Coffee An introduction to Persistent identifiers Coffee 11: 30 – 12: 30 pm 12: 30 – 1: 30 pm 1: 30 – 3: 00 pm Data Sharing, publishing and archiving Lunch Data Sharing, publishing and archiving 3: 00 – 4: 00 pm Evaluation and Wrap-up https: //bit. ly/2 Nry 4 Uv

WHY PERSISTENT IDENTIFIERS? § Managing increasing numbers of data objects and their relations to each other § Sharing and combining data from different resources § Linking data from different resources Data needs to be globally identifiable and addressable to facilitate safe reuse of data § Challenges § Data locations change over time § Data is not per se persistent and might disappear from a resource

FAIR AND PIDS § Findable – Rich Metadata which describes data contains global identifiers to identify and address data § Accessible – Some PID systems allow for storing vital information for addressing the data in the PID itself § Interoperable – Some PID systems allow for storing technical information on the data object in the PID itself § Reusable – PIDs support the technical reusability of data

Metadata THE DATA OBJECT Descriptive sensitive Referring 011100101010010010 Operational Access rights Provenace 0111001010010010010 00111101010 100101011 0111001010010110 0010010 01110010100 01011010010010 Publication Bitstreams (e. g. files) 11904/428 b 37 b 0 -e 42 a-11 e 9 -8 c 3 a 38 f 9 d 326 da 8 b PID

DATA, WHERE IS THE PROBLEM? Metadata Repository Descriptive sensitive Referring 011100101010010010 Operational Access rights Provenace 0111001010010010010 00111101010 100101011 0111001010010110 0010010 01110010100 01011010010010 PID service Publication Bitstreams (e. g. files) 11904/428 b 37 b 0 -e 42 a-11 e 9 -8 c 3 a 38 f 9 d 326 da 8 b PID

PIDS ARE STATIC! § PID always points to the data no matter where the data lies or moves over time. PID 1 PID 2 PID 4 § Data can lie on storage or in services (applications) § Data: Data 2 Data 1 PID 3 Data 4 World of data infrastructure Data 3 § § Files Folders Webpages Sometimes even real world objects

SIMPLE EXAMPLE OF DATA SHARING Publish data online, data is accessed by others Publish online Move to another location Used by another researcher Published online: http: //www. test. com/test. html Other users may cite, access, re-use this url Relocate the resource at http: //www. example. com/ Other users are not informed -> 404 C Staiger, S Bendoukha, EUDAT Summer School 2017, Data discoverability and PIDs

SIMPLE EXAMPLE OF DATA SHARING Publish data online, data is accessed by others PID Resolution service Publish online Move to another location Used by another researcher Register PID Update PID Get PID Details PID system Resolve PID C Staiger, S Bendoukha, EUDAT Summer School 2017, Data discoverability and PIDs

STRUCTURE OF A PID 11304/3265434 c-4 b 34 -11 e 4 -81 ac-dcbd 1 b 51435 e Prefix: - Denoting the owner of the PID - One prefix thousands of PIDs - Unique in the world Suffix: - Specific for the thing that it identifies - Prefix and Suffix together are unique in the world Resolver: - Maps PID to the target - Web-browser compliant; HTTP redirect http: //hdl. handle. net/11304/3265434 c-4 b 34 -11 e 4 -81 ac-dcbd 1 b 51435 e

PID USE CASES

DATA REPOSITORIES § PIDs point to landing page of the digital repository showing metadata § “Real” data can be downloaded from this page with another link § E. g. B 2 SHARE, Fig. Share, Zenodo, … § Example PID (actionable PID = resolver + PID) http: //hdl. handle. net/11304/3265434 c-4 b 34 -11 e 4 -81 ac-dcbd 1 b 51435 e resolves to landing page https: //b 2 share. eudat. eu/records/feafb 12 e 810 c 489 b 9 e 878949 c 6 c 35345

The persistent identifiers for the collection

The persistent identifier for files

ENABLING COMPUTE WORKFLOWS

ENABLING COMPUTE WORKFLOWS Zhang, Abeln, Bijlard, Staiger: https: //dx. doi. org/10. 12688/f 1000 research. 12168. 1

IDENTIFYING (PIECES) OF CODE § § Execute program hidden behind a PID A way to refer to workflows reproducibility

PID SYSTEMS

HANDS-ON: PID RESOLUTION § Go to: http: //hdl. handle. net/21. T 12995/PID-training § Exercise: Warming up! § Resolve several PIDs with their intended resolver § What happens if you resolve a PID with a foreign resolver?

PID ISSUING AUTHORITIES DONA Foundation GDWG Data. Cite SURFsara DOI Cross. Ref HANDLE EPIC grnet CSC

PID ISSUING AUTHORITIES § URN: NBN § § Policies: PID and data are persistent Wants to be independent from data transfer protocols Currently compliant with HTTP Minimum metadata: Dublin core § DOI § § Policies: PID is persistent, data is not persistent Based on Handle system Datacite and Crossref as prefix issuing authorities Minimum metadata: Dublin core PIDs point to a landing page Taylored towards data citation

PID ISSUING AUTHORITIES § e. PIC (European PID consortium) § § Policies: PID is persistent, data is not persistent PIDs can point to anything Based on the Handle system Taylored towards data identification and resolving § DONA foundation (www. dona. net) § Maintains global handle registry § Partners: § CNRI (developer of the handle system) § GDWG (main partner in e. PIC) § International DOI foundation (IDF)

THE HANDLE SYSTEM § Metadata: You can create your own keyword-value pairs and store them with the PID § PIDs allow to make a distinction between data users and data managers § Data users get a PID and can directly access the data and the metadata § Pipelines can programmatically access the metadata and start specific applications § Requires some serious thoughts about data organisation and developing the code to put data policies into practice, including code maintenance For bigger research groups or consortia working in a distributed data environment For repositories who are in need of a host for their PIDs

DEMO: STEP-BY-STEP MINTING PIDS § Register data with a Handle § GET the details of a Handle § Modify a Handle record § Link two files on PID level § Reverse look-up