Persistent Identifiers PIDs Digital Objects DOs Christine Staiger
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara
Persistent Identifiers (PIDs) • Pointers to data resources • Digital Resources: Data, metadata, documents • Real world objects: Species, patient, cell line • Globally unique • Exist infinitely long • Used to identify and retrieve resources • Examples: ISBNs, BSNs, DOIs, EPIC PIDS, URIs
Digital Object (DO) PID Data Metadata • Synchronise PID, Data and Metadata during creation, maintenance and deletion of a digital object!
PIDs are static PID 1 Data 1 PID 2 PID 3 PID 4 Data 2 World of data infrastructure (hardware) Data 4 Data 3
Workflow 1: Change storage environment PID 1 Storage site A PID 2 Storage site B
Use Case 1: Digital repositories • • PIDs point to landing page of the digital repository showing metadata “Real” data can be downloaded from this page with another link E. g. B 2 SHARE, 3. TU Datacentrum & DANS repositories PID http: //hdl. handle. net/11304/3265434 c-4 b 34 -11 e 4 -81 ac-dcbd 1 b 51435 e resolves to https: //b 2 share. eudat. eu/record/139
Use Case 2: Enabling data flows • PIDs point to data directly • If needed create another field specifying the data type to choose application • Use data in workflow via PID, NOT via actual location!
Resolving PIDs 1. Client sends request to Global to resolve 0. NA/123 (prefix handle for 123/456) Global Registry E. g. Handle system hdl: 123/456 2. Global Responds with Service Information for 123 Client gets request to resolve hdl: 123/456 xc xc xc . . . xcccxv xccx xc xc xc xc xc . . . xcccxv xccx xc xc xc . . . IP 4. Server responds with handle data #1 Service Information Local Handle Service Secondary Site B #1 #2 Primary Site #3 Secondary Site A, e. g. SURFsara Local Service
Example: Relationships between DOs PID: prefix 2/suffix 2 PID: prefix 1/suffix 1 Metadata: key 1: … key 2: prefix 2/suffix 2 key 3: prefix 3/suffix 3 Metadata: key 1: … key 2: prefix 1/suffix 1 PID: prefix 3/suffix 3 Metadata: key 1: … key 2: prefix 1/suffix 1 • Part of/has part relationships • Model cohort-patient relationship • Model patient-samples relationship
Guidelines: Characteristics of PIDs • What should be identifiable by a PID? • Define what is data and what is metadata • Granularity of PIDs: • How much information should a PID contain? • Location • Checksums • Other system specific information • Do not put contents information of the data here! • Don’t mix PIDs with other IDs, e. g. database IDs • Opacity: No assumptions about data context in PID
Guidelines: Referable data • How persistent is the data? What and how much in a DO may change? • When should a new DO be instantiated? • Versioning via PIDs? • Define PID management processes: 1. Connecting Data, Metadata and PID 2. Handling changes in data and metadata 3. Handling changes in storage environment 4. Deleting data, metadata, or PIDs • Which problem should be addressed with PIDs?
The handle system • Offers a resolution service for PIDs • Gives a lot of freedom for implementation, e. g. PID information types • Software architecture designed for high availability and scalability • Basis for several PID providers • Costs: 50$ for registering a prefix with handle + 50$/year maintenance • EPIC PIDs and DOIs built their service upon the handle system. Thus, a PID is a handle
PID systems DOIs • Data registry service • Library specific metadata standard incorporated in PID entry (Author info, Dublin core, …) ensuring interoperability between registered data objects • Costs: 0. 06$-1$ per PID, depending on service (Cross. Ref) + annual fee EPIC PIDs • • Data registry service Create own metadata for PIDs for data interoperability Only costs for the handle service With one prefix one can create as many PIDs as wanted
Example: Python epicclient …
B 2 SAFE: i. RODS and PIDs @ KNMI PID Seismic system i. RODS d. Cache NFS share HPSS NFS mount KNMI DMF OS: /data/orfeus/data/continuous/. . . i. RODS: /ORFEUS/eudat/data/continuous/… i. RODS: /vz. SARA 1/eudat/knmi/…
Dataflow KNMI SURFsara The B 2 SAFE is implemented as a 2 step process: 1. Register a file in irods ireg a file in i. RODS @ KNMI create a handle/PID @ KNMI 2. Replicate a file in irods to an other node Replicate the registered file to SURFsara Create a handle/PID @ SURFsara Update the handle/PID @ KNMI
Example handle Domain / prefix / unique identifier Handle/PID @ KNMI: http: //hdl. handle. net/11230/7 bc 49 fd 6 -2836 -11 e 4 -955 ad 89 d 6771 dd 88? noredirect Handle/PID @ SURFsara: http: //hdl. handle. net/11112/387 ed 2 e 4 -5371 -11 e 4 -92 a 8 a 0369 f 0 b 5 f 26? noredirect
Installation • EPIC client, e. g. python or perl client • Handle server and an EPIC API server • i. RODS and B 2 SAFE for ingesting data (optional) SURFsara provides • Handle server • EPIC API
How to obtain a handle prefix • The production prefix has to be purchased from CNRI. • Costs 50$/year plus once 50$ for request • More information on how to obtain a handle prefix: http: //handle. net/service_agreement. html • More information on how to make use of SURFsara’s PID service: http: //eudat. eu/User+Documentation+-+PIDs+in+EUDAT. html
- Slides: 19